hubmapconsortium / ontology-api

The HuBMAP Ontology Service
MIT License
4 stars 3 forks source link

Address risk of ingesting TTY NA as NaN in Pandas code #109

Open computationdoc opened 2 years ago

computationdoc commented 2 years ago

Possibly Simply add parameters to CSV reads:

df = pd.readcsv(file, dtype=object, na_filter = False)

See other suggestions at: https://stackoverflow.com/questions/33952142/prevent-pandas-from-interpreting-na-as-nan-in-a-string

benstear commented 2 years ago

I found the na_filter = False method to work nicely as well.

For what it's worth, I first tried casting the :TYPE column to string, but it did NOT work.

df = pd.read_csv(file, dtype = {':TYPE':str})

AlanSimmons commented 1 year ago

@computationdoc is this still an issue? I have not encountered it. Then again, I wasn't looking for it.