rufuspollock / ideas

Ideas for (tech) stuff to research, build or work on.
50 stars 4 forks source link

Tools and Workflows for Repeatable Sharable Data Cleaning / ETL / Processing #58

Open rufuspollock opened 11 years ago

rufuspollock commented 11 years ago
rossjones commented 11 years ago

I'm building ScraperWikiX at

psychemedia commented 10 years ago

Open Refine recipes/vignettes, especially for "standardised" data formats? eg

webysther commented 8 years ago

chrismattmann commented 8 years ago

Apache OODT? Check out DRAT (Distributed Release Audit Tool) as an example of OODT ETL in action:

lexman commented 8 years ago

tuttle is also as tool for repeatable workflow that is very friendly with team collaboration, and continuous integration (like jenkins for updating data every hour, for example)