Write analysis data to file per collection

robobenklein commented 3 years ago

Output can be imported separately into the db after run without a direct connection.

robobenklein commented 3 years ago

example compatible file format:

"_key","name","city","state","country","lat","long","vip"
"00M","Thigpen ","Bay Springs","MS","USA",31.95376472,-89.23450472,false
"00R","Livingston Municipal","Livingston","TX","USA",30.68586111,-95.01792778,false
"00V","Meadow Lake","Colorado Springs","CO","USA",38.94574889,-104.5698933,false
"01G","Perry-Warsaw","Perry","NY","USA",42.74134667,-78.05208056,false
"01J","Hilliard Airpark","Hilliard","FL","USA",30.6880125,-81.90594389,false

robobenklein commented 3 years ago

We will need to do dedup at some point during this process, either during writing or during import, or even a step in-between which could be something like external sorting.

An external sorting preprocess step before db import could allow dedup between multiple job results as well, maybe:

wsyntree-collector file-merge [paths to job output dirs] -o [path to combined output dir]
wsyntree-collector file-import [paths to jobor merged output dirs]

The flatfile storage format would likely be a single folder containing text files for each collection where each node should be one line (/entry, CSV or escaped style)

Could also get fancy and write intermediary storage using the treetops "hash composes path" style where division amongst files occurs by the _key/_id property.

utk-se / WorldSyntaxTree

Write analysis data to file per collection #18