Open nedclimaterisk opened 5 years ago
Thanks - I wasn't even aware of this. I think this is an interesting idea and would agree that the datatype annotations seems like a logical starting point.
PRs are always welcome if you have an idea on how to implement
how common is this format in the wild?
Probably not very at all, but it's a recommended spec, CSV metadata management is a real PITA, and this seems to solve it. Getting it added to the most popular CSV manipulation library around would really help make it more common, I reckon.
There are also potential side-benefits, for example the #datatype
declaration would allow immediate inference of datatypes without having to scan the first 100 lines of the CSV.
Related to #2485
The W3C Tabular Data Model recommendation that include arbitrary text data, as well as column-specific metadata, such as column data types.
It would be very nice if Pandas could read metadata like this. There is a section with an example of CSV/TSV meader metadata that might make a good starting point. The full recommendation seems somewhat vague, but perhaps that means that Pandas could help to define some more specific standards.
Perhaps a YAML header behind
#
characters, where some known variable names (e.g.datatype
) are captured for use in reading the rest of the file, where remaining unused YAML data is added to adf.metadata
dictionary?