Open genisplaja opened 1 year ago
Hey @genisplaja, we were discussing the issue of optional dependencies with @harshpalan the other day. We thought that a potential solution could be to add a check in loaders that need those optional dependencies, so when you initialize the dataset it will warn you that the dependency is missing and you should install it for that dataset to work. Would that fix your issue?
Yes! Now we were having a conversation with @nkundiushuti that maybe we could also try to change from .xslx to .csv, since we have access to the Zenodo entry, however, IMHO creating a newer version just for that is a little bit overkill.
What you actually propose I think would be nice. We could even use pipdeptree
to check if the user has the particular optional dependencies installed, and if not, throw the warning.
Hello! As you might have seen, PR #560 is blocked because of an optional dependency that is missing:
That is because I am trying to load a dataset metadata .xlsx file using
pandas.read_excel()
. I tried to use different loading strategies but didn't work, that seems to be the standard way to load .xlsx files. Wouldopenpyxl
be a problematic dependency to have in mirdata, taking into account that we may have a dataset in the future that includes .xlsx files as well (I know is not common but who knows...). Otherwise we could includeopenpyxl
as an optional dependency for the dataloader in #560. What do you think?