CSV files are bad databases

vmbrasseur / Perl_Companies

A list of companies which use Perl. Initially generated from postings to jobs.perl.org.

Other

54 stars 43 forks source link

CSV files are bad databases #18

Closed denny closed 11 years ago

denny commented 11 years ago

Markdown files are even worse databases. Two files that have to be separately maintained in parallel is a really, really, really bad database.

This data really needs to be fed into a relational database (maybe SQLite?), then the CSV and Markdown files thrown away. If/when you want one of them, you should generate it from the database using a utility script.

evdb commented 11 years ago

Agree. I'm seeing some encoding issue where having made my edit to the CSV my editor (TextMate) upon saving has changed many other lines.

I'd suggest using JSON for the base dataset - it is easy to work with and extend and has a fixed encoding. From that other representations (like csv, or markdown, or html pages) can be automatically generated. An example entry might look like this:

{
  "name": "123Doc Medical Courses",
  "location": "United Kingdom, London, London",
  "mostRecentPosting": "2006",
  "hiringStatus": "Dormant"
}

In future if you want to add another field (for example website) then it need only be added to the records that have it, making it much easier to do and leading to cleaner diffs.

Feel free to change the camel case to something more perlish :)

bigpresh commented 11 years ago

SQLite was my first impression too, but I think I'd agree that a JSON file is more flexible, portable, and easy to work with. The JSON file could easily be downloaded direct from GitHub periodically by services which want to use it, and is flexible enough that adding extra info is pretty trivial (although a spec of the field names and their meanings should probably be agreed first).

I imagine additional fields that will be useful will be the company's website, perhaps an email address for whoever is responsible for Perl-related hiring (should people want to list themselves for potential candidates to apply), and maybe a flag to indicate whether they use Perl for public-facing front-end stuff, back-end stuff only, or both. A field a company could use to indicate the amount they use Perl might be useful too (e.g. the diference between "we use it for a couple of internal tools" and "our business is built on it").

Also, the "hiringStatus" field should, I think, be a boolean "hiring" field for ease of working with.

denny commented 11 years ago

Location needs to be broken down too, while we're picking on the selection of fields; 'street address', 'town/city', 'state/region', 'zip/post code', 'country'. People can fill in whatever level of detail they want to share (or whatever they find on companies' websites).

evdb commented 11 years ago

Regarding location it is also important to distinguish between the business' HQ location, and where employees can be based. I work for a fully remote organisation that has its HQ at a PO box in London, but the employees are all over the UK.

But this is going off-topic from the original point of this issue - to have a more robust storage format.

vmbrasseur commented 11 years ago

There's never been any plan for it to be the end-all-be-all either of UIs or of methods for storing the data. What it is is a way to make information available while we work together to develop something more sustainable/scalable.

In the coming weeks I'll be putting up a wiki and registering a channel on Freenode for us to discuss things. Please follow \@perlcompanies on Twitter for announcements.

vmbrasseur commented 11 years ago

In the meantime, this issue appears to be more of a discussion than an discrete actionable item. I'm closing it.

We'll pick up the general spirit of the issue (move to a RDBS or other data store) elsewhere once the next generation of Perl Companies is determined.