Open turicas opened 6 years ago
Fixed on d43be1dce2d4a64973fe4cae03a745fba7e6577e.
Reopenning because of this error:
AttributeError: 'file' object has no attribute 'readable'
(I think it's related to Python2)
Maybe this thread helps.
Reverted merged change of #276 since it cause problems on python2. Trying to fix the problem in a new branch: feature/csv-remove-null-bytes.
The file is no longer accessible, but it seems you're dealing with an UTF-16 encoded file. Try using:
b = open("file.csv", "rb").read().decode("utf-16")
@mawkee it was not an UTF-16-encoded file (this one was encoded in ISO-8859-15 but had \x00
bytes inside the data) - it didn't even have the BOM.
Our doesn't didn't seem to have it either but if you open with "rb" and then decode it magically works as utf-16.
@turicas got it; I tried opening the data using ftfy
and it worked all right for my case
Some CSV files come with NUL chars (
\0x00
) inside and the Pythoncsv
module doesn't know how to deal with it. So I think it's a great idea to have automatic NUL removal in the CSV plugin. Anio.TextIOWrapper
will do the job, like this one:class NotNullTextWrapper(io.TextIOWrapper): def read(self, *args, **kwargs): data = super().read(*args, **kwargs) return data.replace('\x00', '') def readline(self, *args, **kwargs): data = super().readline(*args, **kwargs) return data.replace('\x00', '')
Sample file with this problem: http://arquivos.portaldatransparencia.gov.br/downloads.asp?a=2011&m=01&consulta=GastosDiretos
Exception raised:
_csv.Error: line contains NULL byte
Thanks for posting the code. Was also useful outside of this project.
Some CSV files come with NUL chars (
\0x00
) inside and the Pythoncsv
module doesn't know how to deal with it. So I think it's a great idea to have automatic NUL removal in the CSV plugin. Anio.TextIOWrapper
will do the job, like this one:Sample file with this problem: http://arquivos.portaldatransparencia.gov.br/downloads.asp?a=2011&m=01&consulta=GastosDiretos
Exception raised:
_csv.Error: line contains NULL byte