Currently we re-upload the entire schemaless.csv file whenever create_schemaless is run. This should be changed to only upload the newest lines. This will dramatically speed up this step of the workflow since the new lines should only be a few megabytes instead of 2 GB.
Currently we re-upload the entire schemaless.csv file whenever
create_schemaless
is run. This should be changed to only upload the newest lines. This will dramatically speed up this step of the workflow since the new lines should only be a few megabytes instead of 2 GB.