Although CSV technically has an "official" standard (RFC4180), CSV, as a format out in the wild, can fluctuate wildly. To make matters worse, it has no method of defining metadata to specify a file's particular "flavor" of CSV. I have several ideas to help overcome some or all of these shortcomings (ie: schemas, flavors, auto-detect or "taster" class, etc.).
[x] Read the RFC (LOL It's only 8 pages long!! I'm used to the iCalendar RFC... this one will take NO TIME!)
[x] Figure out which parts you plan to implement
[ ] Write some sort of "compliance" document so that users know what to expect and how this library deals with CSV's caveats and shortcomings.
[ ] When implementing these features, be sure to put some reference to this document in the docblock, using the @see attribute.
[ ] Include a copy of the RFC (https://tools.ietf.org/html/rfc4180) in the source code/repository for reference purposes. Also not a bad idea to include excerpts from the RFC within docblocks of relevant classes/functions.
[ ] Write up an extension to the RFC (as best you can) to describe your implementation of CSV schemas. Take a look at what other people have done with CSV schemas. Check out the Digital Preservation Schema Specification. Also take a look at this Github search for "CSV Schema". There is actually quite a diverse list of CSV schema implementations you can either conform to or be inspired by.
[ ] I am going with the CSV for the Web Working Group's specifications for most likely everything that doesn't fall under the "standard csv" format. I wrote all of the above before discovering it and so it rendered probably everything in this ticket unnecessary... I'll keep it open though until I know more.
Data File Metaformats - description of several text formats that takes a very negative view of CSV as a format due to its inconsistency, poor design, and lack of a true official standard
The CSV File Format - a fantastic, very detailed write-up explaining all the ins and outs of CSV as a format. It covers every caveat. Very informative. Very useful.
CTX File Format - CSV on sterroids. This is a semi-proprietary standard that looks a lot like CSV (although it uses pipes rather than commas), but with the metadata that is so desperately missed in CSV
CSV on the Web Working Group - A large percentage of the data published on the Web is tabular data, commonly published as comma separated values (CSV) files. The CSV on the Web Working Group aim to specify technologies that provide greater interoperability for data dependent applications on the Web when working with tabular datasets comprising single or multiple files using CSV, or similar, format.
Data Packages - A Data Package (or DataPackage) is a coherent collection of data and possibly other assets in a single ‘package’. It provides the basis for convenient delivery, installation and management of datasets.
JSON Table Schema - I believe this is the format used by CSVWWG above
JSON CSV Dialect Format - Very similar to python's dialects. I will most likely change my "flavor" class's name to "dialect". Although it bums me out because I liked the idea of people having to write $flavor = $taster->lick($data) but oh well :( Actually, the more I think about it, the more I think it really doesn't matter what I call them. As long as they conform to the others, it should be fine.
Python's CSV PEP - A Python Enhancement Proposal specifying a standard mechanism for dealing with CSV data within python. My library has always been heavily influenced by this document. In fact the first version was basically a direct line for line port of the python csv module (although it has since become a creature all it's own).
Although CSV technically has an "official" standard (RFC4180), CSV, as a format out in the wild, can fluctuate wildly. To make matters worse, it has no method of defining metadata to specify a file's particular "flavor" of CSV. I have several ideas to help overcome some or all of these shortcomings (ie: schemas, flavors, auto-detect or "taster" class, etc.).
Write up an extension to the RFC (as best you can) to describe your implementation of CSV schemas. Take a look at what other people have done with CSV schemas. Check out the Digital Preservation Schema Specification. Also take a look at this Github search for "CSV Schema". There is actually quite a diverse list of CSV schema implementations you can either conform to or be inspired by.