Closed jayunit100 closed 10 years ago
Ive just commited a diagram of the flow, to make it linear. now no redundancy between hive/pig. no need for overdoing the comparison stuff. this will make it easier to code for us. We will have hive/pig work together, rather than compete :)
http://bit.ly/1bnzYj1 <-- theres the image of the new architecture
The ETL class is Pig. It will ultimately clean data for hive to query. It should just write to disk in all cases.