Closed tempoxylophone closed 1 year ago
Closing this because editing the csv files in this way produces the exact problem in earlier versions of pandas. A better solution would be to add the keep_default_na=False
keyword argument for all instances where the csv files are read.
Pandas 2.0.0 was released today. A new argument called
dtype_backend
was added to theread_csv()
function that appears to affect the default behavior when reading null values.When the respective
master.csv
are read with Pandas 2.0.0, when the value"None"
is written in a string, it appears to now be parsed by default toNaN
. This is problematic in places where themeta_info
object dictionary's propertyadditional edge files
andadditional node files
are assumed to be a string.ogb/linkpropped/dataset.py
:The
.split(",")
function called on these will throwAttributeError: 'float' object has no attribute 'split'
.There are more instances that can cause this exception beyond the two above.
I considered two options - either edit the script and corresponding
master.csv
files to contain the empty string instead of 'None', which are parsed as""
instead ofNaN
, or add the keyword argumentkeep_default_na=False
to instances ofpd.read_csv
where this could be an issue. This keyword argument prevents the"None"
s from being parsed asNaN
s.Seeing as there are more instances of the latter option and would require a larger diff, I opted for the former approach. This involved editing the
make_master_file.py
files in their respective directories. I may have discovered a small inconsistency with the Python code inogb/linkproppred/make_master_file.py
for thehas_edge_attr
property for the ogbl-vessel dataset. Inmake_master_file.py
, this property was set toFalse
, but the committed file in the latest release has this property set toTrue
in the csv file.