jghoman / haivvreo

Hive + Avro. Serde for working with Avro in Hive
Apache License 2.0
59 stars 27 forks source link

logging too verbose #8

Closed koertkuipers closed 12 years ago

koertkuipers commented 13 years ago

i ran out of disk space on same cluster nodes due to a log line being generated for each record that was re-encoded with the readers schema. i think this should be debug?

2011-10-27 22:00:05,084 INFO com.linkedin.haivvreo.AvroDeserializer: Received different schemas. Have to re-encode: {"type":"record","name":"test","namespace":"com.linkedin.haivvreo","fields":[{"name":"test1","type":["null","string"]},{"name":"test2","type":["null","string"]}]}

jghoman commented 13 years ago

Yeah, you're right. Interested in coming up with a patch?

dkarvounis commented 12 years ago

I had this same issue yesterday. I had to change the same logging statement to DEBUG level from INFO on the main branch. I'm not sure if it was changed to WARN recently for a reason.

jghoman commented 12 years ago

What version was it on? I changed it back to WARN because there shouldn't be any more situations where we can get into needing to re-encode. If it's on a recent version, can you report what you were trying to do?

koertkuipers commented 12 years ago

jacob i still get re-encoding too. it is because the code that tries to match the partition doesn't take all situations into account.

specifically it doesn't take the situation into account where it compares a qualified path with a non-qualified path.

AvroGenericRecordReader.pathIsInPartition will return false if it compares a split with path "hdfs://somenode:8020/somepath/somedir" with a partition with path "/somepath/somedir"

On Tue, Apr 24, 2012 at 11:31 AM, Jakob Homan < reply@reply.github.com

wrote:

What version was it on? I changed it back to WARN because there shouldn't be any more situations where we can get into needing to re-encode. If it's on a recent version, can you report what you were trying to do?


Reply to this email directly or view it on GitHub: https://github.com/jghoman/haivvreo/issues/8#issuecomment-5307247

jghoman commented 12 years ago

@koertkuipers and @dkarvounis I've pushed changes to both the avro14 and avro15 branches to use the fully qualified path from the input split, as suggested by Koert. Try those. If you're still getting re-encoding, then something else is quite amiss and I doubt I can diagnose it, since I can't reproduce it, although I'd of course be happy to accept patches. I'm getting ready to push the code to Hive, so hopefully more eyes will be able to see if we're missing something.