Open jorisvandenbossche opened 5 years ago
In the class methods, we check that the object is hashable (before doing a kh_put_pymap
). Eg in PyObjectHashTable.unique:
while in the function implementations (such as duplicated), we don't do that check. Eg
The "pymap" hashtable needs the values to be hashable, but doesn't actually check that? Should we add a hash(val)
in those functions as well?
cc @jreback
@jorisvandenbossche yeah you could try that here
but this may introduce a performance issue
From a quick test, it seems that adding a hash(val)
in the for loop to create the hash gives a 20% slowdown (45ms -> 55ms for a string series of 3 million elements).
So that's a considerable slowdown.
A "proper" fix might be to check in the actual khash
C code for a return value of -1 of PyObject_Hash, but I would rather not start meddling with that implementation.
Alternative could be doing a "best effort" check by hashing the first element. So if you have a full object series of unhashable objects, that would at least be catched. But it is not that nice that it depends on the order of the values if the error is raised or not (eg if you have a missing value in the first location).
as the shapely Point objects are not hashable:
xref #12693, where other unhashable objects raise (which is probably the correct behavior with the current implementation or perhaps raise a NotImplementedError and #12693 should maybe an enhancement request instead.)
however, the code in the OP is no longer unstable AFAICT
xref #12693, where other unhashable objects raise
indeed it does with a DataFrame with more than one column
pd.Series(a).to_frame().assign(dup=lambda x: x[0]).duplicated()
however, the code in the OP is no longer unstable AFAICT
looks like fixed sometime after 1.2.5
From a flaky test in geopandas, I observed the following behaviour:
So you see that sometimes it works, sometimes it does not work.
I am also not fully sure how the object hashtable works (assuming
duplicated
uses the hashtable), as the shapely Point objects are not hashable: