vendekagon-labs / unify

An engine for automating data integration & harmonization via schema inference.
Apache License 2.0
8 stars 0 forks source link

Support CSV files, faster CSV/TSV parsing #14

Closed benkamphaus closed 10 months ago

benkamphaus commented 10 months ago

This adds support for reading CSV files, and brings in charred as a drop in replacement for data.csv for faster csv/tsv reading in performance critical paths.

There's some fiddly branching in the CSV vs TSV file handling, but waiting to rewrite this until after:

  1. need to support a third file format (rule of three blah blah blah)
  2. parse config cleaned up
  3. contextualize removed from parse config

This resolves #12