simonw / git-history

Tools for analyzing Git history using SQLite
Apache License 2.0
191 stars 18 forks source link

Re-encode for my csv problem #52

Open scoates opened 2 years ago

scoates commented 2 years ago

Discussed in https://github.com/simonw/git-history/discussions/50

I think I had two problems here, but they might be related:

  1. commit.tree.blobs was [] here. I changed this to use the tree['filename'] notation
  2. my data was in Latin-1/iso-8859-1 (I didn't know this at first). I added an option to --re-encode

Tests pass.

scoates commented 2 years ago

FWIW:

❯ file -I scraped-data/emergency-rooms/quebec/Releve_horaire_urgences_7jours.csv

scraped-data/emergency-rooms/quebec/Releve_horaire_urgences_7jours.csv: application/csv; charset=iso-8859-1
simonw commented 2 years ago

Sorry for not looking at this sooner!

I'm not keen on --re-encode as the option here. I prefer --encoding X purely for consistency with my other tool sqlite-utils: https://sqlite-utils.datasette.io/en/stable/cli-reference.html#insert

scoates commented 2 years ago

I honestly forget how this works. If you're happy with the other method, so am I. (-:

lassebenni commented 2 years ago

Can we merge this? I also encountered this issue and didn't see this PR so ended up with a similar fix but this wouldv'e saved me some time!