Closed camfindlay closed 6 years ago
My thinking about a csv is the same reason I avoid putting the information in a markdown table- It is easy for people entering data to stuff it up for other entries- In the case of a csv if people start using apostrophes or commas in the description field it potential breaks the csv.
OTOH, with standardised headings at the start of the line of an entry, it is easy enough to write something that converts the current format to a csv- I could do it in 3 or 4 R commands (and add it as a very self referential example). So my personal preference is to make human data entry and browsability easy, and provide the examples to convert to other for if needed (and those derived forms could go other places as appropriate)
In part, while I think this would be useful, and I know others who do, we kind of need to gauge the interest and uptake, so providing it can easily be converted in form later (such as to csv) I would like to make the proof of concept as easy as possible for people.
A converter was going to be my other suggestions here... so we can agree:
1) Primary way someone adds an example is via a block in the md file (that's the "human" readable information. 2) We have a converter that runs to produce the CSV (I'd be keen to then commit this csv to the repo still so it can be used as machine readable copy).
If we could do the conversion script in python I could look at automating the runs of it on a regular basis via a server.
I am entirely comfortable with the idea of using python to do a conversion script, I'd just need to look up things if I was writing it (I tend to use R day to day, so need to look up a lot more stuff when using python). As it makes it easier for your workflow I support the idea.
+1 shall we make the call to go with a python conversion script once we've firmed up the structure for the human friendly list
Sounds good. I could jot a quick R script (as my language of choice) that does a conversion as a reference for a later python one.
I've added an example script in R so I will close this
Can I get you to run the script locally and commit the resulting CSV (assume the items yo uhave listed are real ones?).
That way I can look at both putting the source repo under data-govt-nz GH organisation AND listing the Community contribution dataset as a dataset on data.govt.nz ;)
Done (they were real items)
It might be nice to hold this information in CSV fomat in some way. GitHub nicely presents CSV for end users to look over (like a register of sorts).
I can also see this being useful longer term if we eventually look to surface examples of data use on data.govt.nz by drawing on this register and matching up dataset on data.govt.nz with references in this CSV register.
Could then pull the Description and URL to the exmaple on to data.govt.nz for example as a "real world use of this data" section.
Beauty of doing this as CSV is that we could add it as a dataset on data.govt.nz, which auto creates a concumable API which we could use for the integration (or for that matter, anyone could use).
I'll mock up an example and submit as a PR 👍
Thoughts on this, happy to dialogue on it a bit.