yakovsh / rfc4180-bis

Repository for work regarding the new version of RFC 4180
Other
8 stars 2 forks source link

Add optional "profile" argument #28

Open stain opened 3 years ago

stain commented 3 years ago

(Also posted in art@ietf thread)

Have you got any suggestions for profiles for CSV files?

This RFC gives a narrow, precise definition for one way to make CSVs, which is very useful ; but we know there are multiple lower-case "csv" files, which may use different separators for instance. See Python's CSV module's list of CSV dialects and options https://docs.python.org/3/library/csv.html#csv-fmt-params

A webserver determining mime type using file extension has no way to know that the file is conforming to this RFC or not, but may still want to serve it as text/csv instead of application/octet-stream as it's likely to be understood by more permissive CSV parsers.

On the other hand, a server generating a conforming CSV may want to indicate this, perhaps as:

Content-Type: text/csv; profile="https://www.rfc-editor.org/rfc/rfcTODO#ABNF"

(using the new RFC number of course)

In 2.1.3 you still add one variation allowing two effective rfc4180-bis dialects:

  1. The first record in the file MAY be an optional header with the same format as normal records.

But previously in RFC4180 this presence was recordable with the optional header parameter. What is the argument of removing this?

If desired, this change should be justified within the text.

I would argue today, following RFC6906, this precision would be better achieved with the optional profile parameter.

Known to have header:

Content-Type: text/csv; profile="https://www.rfc-editor.org/rfc/rfcTODO#ABNF https://www.rfc-editor.org/rfc/rfcTODO#header"

Known to NOT have header:

Content-Type: text/csv; profile="https://www.rfc-editor.org/rfc/rfcTODO#ABNF https://www.rfc-editor.org/rfc/rfcTODO#noheader"

Note: profile value is white-space separated, so here we indicate both that we have a header (or no header), in addition the intention to actually follow the ABNF of this RFC. This separation of concerns allows a server to say the file have a header, even if they break the ABNF (or have not checked).

Unlocking the "profile" parameter would also allow third-parties to use it to indicate their own dialects, for instance https://docs.python.org/3/library/csv.html#csv.unix_dialect would be a perfectly valid profile.

nightwatchcyber commented 3 years ago

"header" was removed because it is not in use. I am not aware of any implementation that actually used it.

The profile idea is interesting - perhaps this can be use along with the CSVW syntax: https://www.w3.org/2013/csvw/wiki/Main_Page