varchar-io / nebula

A distributed block-based data storage and compute engine
https://nebula.bz
Apache License 2.0
154 stars 18 forks source link

We should capture all file path for in a spec's ID #176

Closed shawncao closed 2 years ago

shawncao commented 2 years ago

Replace paths_ with a new object, such as FileSplit which should capture {file path, offset, length} and these should be part of the spec identifier. @ritvik-statsig

shawncao commented 2 years ago

Quote unresolved comment:

we may want to include watermark_ in this new object too - think about the case that Time Macro is at the file name, and if you combine multiple files which have different time values, we will need watermark for each file. Unless we guarantee different watermarked files won't be grouped into one spec.
shawncao commented 2 years ago

Similarly, we should include all paths in the string result.

shawncao commented 2 years ago

This is resolved in #177