jghoman / haivvreo

Hive + Avro. Serde for working with Avro in Hive
Apache License 2.0
59 stars 27 forks source link

Files in Avro-backed Hive tables do not have a ".avro" extension #7

Closed tomwhite closed 13 years ago

tomwhite commented 13 years ago

Jakob, you might be interested in this one too. See https://issues.apache.org/jira/browse/HIVE-2457 for the background and dependent Hive patch.

Cheers, Tom

jghoman commented 13 years ago

Jacob Rideout had forked Haivvreo a while ago and was doing some work and added this commit: d4eb1a625da91f1a73856849e3e2d0d6375fd4a8 I hadn't checked to see if that would work.

tomwhite commented 13 years ago

I thought about doing something similar, but I'm not sure it would work since (at least in some cases) Hive creates temporary files then moves them to the final after they are complete (see FileSinkOperator.FSPaths.commit()). Adding an extension would result in Hive not finding the files to move since they weren't named as Hive expected. However, I haven't tried this either. I'll comment on Jacob's commit.

tomwhite commented 13 years ago

HIVE-2457 has been committed now. I've tested Haivvreo with this change and I can successfully generate Avro tables whose files have the ".avro" extension.

jghoman commented 13 years ago

+1. Thanks for doing this, Tom.