secretGeek / csvz

The hot new standard in open databases
Creative Commons Zero v1.0 Universal
30 stars 2 forks source link

`meta-per-file` -- allow individual meta files for each file? #10

Closed MarkPflug closed 4 years ago

MarkPflug commented 4 years ago

Have you considered using a columns meta file per-table instead of putting all columns into a single csv?

So instead of: _meta/tables.csv _meta/columns.csv states.csv citites.csv

It would be something like: _meta/tables.csv _meta/states_columns.csv _meta/cities_columns.csv states.csv cities.csv

The advantage is that it would be easier to get the schema for a single table.

secretGeek commented 4 years ago

It's a good idea. It would also mean that when a tool is writing the files, you wouldn't have contention on those schema files.

secretGeek commented 4 years ago

How about the files would go into subfolder. e.g.

_meta/columns.csv

Can contain details of all columns in all tables.

or if using the "file per file" technique... and you have a file "states.csv"

_meta/columns/states.csv

And similarly:

_meta/tables/states.csv

Would describe the states.csv file, with the same info that would normally be in the "tables.csv" file

secretGeek commented 4 years ago

Interestingly: You can combine both methods!

For example consider a csvz containing a file states.csv

The table may be described in both "_meta/tables.csv" and _meta/tables/state.csv" -- in which case - the information about state.csv in the "tables.csv" would be ignored.

The columns of states.csv may be described in both "_meta/table-columns.csv" and _meta/tables/state.csv" -- in which case - the information about state.csv in the "tables.csv" would be ignored. The file under _meta/tables is considered "more specific" and of higher precedence.

(Suggestion for authors of Tooling that reads these files: they may want to output debug information that describes where meta data was sourced from, and highlights situations where precedence rules needed to be applied.)

Columns of states.csv may be described in both "_meta/columns.csv" and _meta/columns/state.csv" -- in which case - the information about state.csv in the "columns.csv" would be ignored. The file under _meta/conlumns is considered "more specific" and of higher precedence.

You can also mix and match without loss of meaning.

For example the table states.csv may be described in _meta/tables.csv while it's columns may be described in _meta/columns/states.csv

(Oh and I haven't described schema mappings yet... I'll get to that another time)