Closed MarkPflug closed 4 years ago
It's a good idea. It would also mean that when a tool is writing the files, you wouldn't have contention on those schema files.
How about the files would go into subfolder. e.g.
_meta/columns.csv
Can contain details of all columns in all tables.
or if using the "file per file" technique... and you have a file "states.csv"
_meta/columns/states.csv
And similarly:
_meta/tables/states.csv
Would describe the states.csv file, with the same info that would normally be in the "tables.csv" file
Interestingly: You can combine both methods!
For example consider a csvz containing a file states.csv
The table may be described in both "_meta/tables.csv" and _meta/tables/state.csv" -- in which case - the information about state.csv in the "tables.csv" would be ignored.
The columns of states.csv may be described in both "_meta/table-columns.csv" and _meta/tables/state.csv" -- in which case - the information about state.csv in the "tables.csv" would be ignored. The file under _meta/tables is considered "more specific" and of higher precedence.
(Suggestion for authors of Tooling that reads these files: they may want to output debug information that describes where meta data was sourced from, and highlights situations where precedence rules needed to be applied.)
Columns of states.csv
may be described in both "_meta/columns.csv" and _meta/columns/state.csv" -- in which case - the information about state.csv in the "columns.csv" would be ignored. The file under _meta/conlumns is considered "more specific" and of higher precedence.
You can also mix and match without loss of meaning.
For example the table states.csv
may be described in _meta/tables.csv
while it's columns may be described in _meta/columns/states.csv
(Oh and I haven't described schema mappings yet... I'll get to that another time)
Have you considered using a columns meta file per-table instead of putting all columns into a single csv?
So instead of: _meta/tables.csv _meta/columns.csv states.csv citites.csv
It would be something like: _meta/tables.csv _meta/states_columns.csv _meta/cities_columns.csv states.csv cities.csv
The advantage is that it would be easier to get the schema for a single table.