okfn-brasil / serenata-toolbox

📦 pip module containing code shared across Serenata de Amor's projects | ** Este repositório não recebe atualizações frequentes **
MIT License
155 stars 70 forks source link

Rename .xz to .csv.xz #176

Open cuducos opened 6 years ago

cuducos commented 6 years ago

As suggested by @turicas here:

What is the problem? When we uncompress a file using the command xz -d file.xz the uncompressed file name will be file. It'd be good to have the files named xxx.csv.xz instead of xxx.xz so some software (which depends on file extension) will work properly.

How can this be addressed? 1) Replace lines like this (see serenata_toolbox/chamber_of_deputies/dataset.py):

            .replace('.csv', '.xz') \

with:


         .replace('.csv', '.csv.xz') \

This also happens on serenata_toolbox/federal_senate/dataset.py and maybe in other places.

2) Will also need to change all places where these files are read.

Who could help with this issue? @cuducos?

cuducos commented 6 years ago

Just a quick note to anyone interested in this issue: this change will potentially broke a lot of stuff, for instance every notebook on serenata-de-amor loading .xz files, every script on serenata-toolbox, jarbas, rosie and whistleblower opening .xz files…

I do believe that adding the proper extension is helpful and we should do it any of these days. But this must be coordinated along multiple repos. Any ideas @anaschwendler and @irio?

willianpaixao commented 5 years ago

I can implement this feature together with #199. But almost sure that some refactoring will be needed in Rosie and Jarbas, correct?

cuducos commented 5 years ago

But almost sure that some refactoring will be needed in Rosie and Jarbas, correct?

I think minor adjustments in Rosie, and just updating docs in Jarbas is enough : )