Open leesei opened 3 years ago
+1 on this, I'm trying to adopt this library into https://www.benthos.dev and currently I'm struggling to get even basic examples to work. The error message I'm getting right now is interface {} is nil, not string
. I'm not sure how I could recommend someone use this in my service when I know they're going to get zero meaningful feedback when things aren't working.
@Jeffail We've been using this in production for 2 years now, moving billions of records every day to parquet files on hdfs. I can definitely say that its performing very well and we don't encounter any issues. Internal library messages in any case should be "translated" to user readable benthos-ish messages.
If you wish I can show you how we do it and get you started. And I'll be happy to see if we can benefit from Benthos...
PM me for details
Thanks @xtrimf, I'm mostly concerned about how I'd be able to specify to a user which field it is that's causing the error, as otherwise they'd have no way of knowing whether the problem is that their schema has a typo in it or the data is incorrect. Ideally I'd want to expose the specific row that causes the error on a write flush but for starters I just want the field name.
Your are sending a struct/json/slice to be written in a parquet rowGroup. I'm not sure it is possible (didn't verify) as the whole object gets encoded...but I could be wrong.
For generic use, when the source data is unknown, we parse the data and decide on the fly its type and build the object schema accordingly - so there is never a mismatch. But we use DBs as source mainly so its easy to get the types beforehand.
+1, the error reporting isn't good enough. Schema errors give no indication of which field was at fault. I had to put debug messages in to find the problem. Same with writing errors. At the very minimum it needs to point the dev at where the problem might lie.
I encountered "index out of range [0] with length 0" error in
PagesToChunk()
in PagesToChunk.go. The reason being I used the wrong struct to inNewParquetWriterFromWriter()
(hence the pages being empty).As in the case of "invalid memory address or nil pointer dereference" error (#421). The reasoning was me using non-native type (
time.Time
) in object interface.These are indeed my errors and easily fixable. But I would recommend improving the error messages to improve the user experience.