polaris-hub / polaris

Foster the development of impactful AI models in drug discovery.
https://polaris-hub.github.io/polaris/
Apache License 2.0
94 stars 6 forks source link

Allow hash pandas series which contains numpy.array or list #133

Open zhu0619 opened 4 months ago

zhu0619 commented 4 months ago

Is your feature request related to a problem? Please describe.

Dataset uses pandas.util.hash_pandas_object to compute the checksum. However, there are cases, the data type of pandas series is a list or a numpy array. Such as pd.Series([[1], [2], [3]])) produces error TypeError: unhashable type: 'list'

Describe the solution you'd like

A solution to be able to compute the hash for data like pd.Series([[1], [2], [3]])

cwognum commented 4 months ago

Hey @zhu0619 , could you give an example of a dataset for which you ran into this issue?

It seems to me you could always restructure the dataset such that you don't need to save lists or arrays in Pandas columns.