Open brianckeegan opened 6 years ago
@brianckeegan I confirmed your observation, and I found the source of your problems.
If you look at the _get_recs
method the first byte is tested to see if the record is deleted. According to the specs, that first byte should be a space. If not, it is assumed that the record was deleted.
I modified my local version of the simpledbf file to print('No rec')
if that flag is triggered, and when I try to convert that particular DBF file to a DataFrame, I get a bunch of "No rec"s printed.
My suggestion would be to comment out those two lines of your local simpledbf install; however, you might want to ping the folks that made that file to find out why all of the records are marked as deleted... Hope that helps.
If modifying your install is not in the cards, I was able to open that file with Excel 2016, and it seemed like all of the records were there. It is interesting, though, that Excel does not give any indication that the records were marked for deletion as per the specs...
Perhaps I'll need to add a "show deleted records" flag to the code... Once we switch over to 30h days, I'll use that new time to make some updates :)
Thanks for looking into this. I don't think I'll be able to convince anyone at NHTSA to alter these data, but it would be great if there was an option when initializing in Dbf5
or using to_dataframe()
to include all records, even those flagged as deleted like this case.
I'm working with the National Highway Transportation and Safety Administration's Fatality Analysis Reporting Systems which reports its data in DBF format from 1975-2016. The data starting in 2010 ( ftp://ftp.nhtsa.dot.gov/fars/2010/DBF/FARS2010.zip ) is read without error and includes appropriate-looking
Dbf5
attributes, but none of the conversion methods produce data.Using other methods like
to_csv
and other codec encodings produces the same behavior: no data produced. I've reproduced this issue on both Mac and PC on Anaconda 4 running Python 3.6 in Jupyter Notebook.