Open turicas opened 7 years ago
We can use magic numbers to check file contents:
In [14]: open('projects/eleicoes-brasil/data/download/candidatura-2018.zip', mode='rb').read(4).startswith(b'PK')
Out[14]: True
In [15]: open('Downloads/prestacao-contas-2002.zip', mode='rb').read(4).startswith(b'Rar!')
Out[15]: True
For RAR archives we can use rarfile library (needs external tool to extract files, on Debian systems: apt install libarchive-tools
ou apt install unrar
- the last is not free/libre).
In #108 we're going to add support for compressed files and then we also need to support archive files, such as
.tar
,.zip
,.rar
.rows.utils.compress
androws.utils.decompress
functions to also support archivesrows.cli
to parse filenames, identify archives and then callcompress
/decompress
(could use something like this:filename.zip::path/to/file.csv
).filename.zip::path/to/file.rar::other/path/file.csv
.