Open mx781 opened 1 year ago
So after mucking around a bit in the source, I'm realizing that get_indexer
is inherently meant for unique indexes only, and get_indexer_non_unique
doesn't support non-default methods, so I take it this is just not supported. Anyone care to weigh in how complex of an addition might this be, or if there are blockers to do this in the first place?
In the meanwhile, here's a naive/slow/untested workaround for anyone stumbling upon this use case:
def get_non_unique_fill_indexer(index, key, method="ffill",tolerance=None):
assert method in {"ffill", "bfill"}
duplicates = index.duplicated()
index_deduplicated = index[~duplicates]
dedup_indexer = index_deduplicated.get_indexer([key], method=method, tolerance=tolerance).item()
if dedup_indexer == -1:
raise KeyError(key)
num_duplicates_before = len(index[(index < key) & duplicates])
indexer_end = index[num_duplicates_before + dedup_indexer]
indexer = index.get_loc(indexer_end)
return indexer
Feature Type
[X] Adding new functionality to pandas
[X] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
I'm trying to return the requested-or-previous index of a monotonic-increasing DataFrame. If the index is also unique, this works fine:
However if the index is not unique,
InvalidIndexError
is raised:The same occurs with all other
method
s, evenNone
(which I thought should work, sincedf.index.get_loc(3)
works fine and returns a slice).Feature Description
This limitation doesn't seem to be outlined anywhere in the docs, so I'm unsure if this is a missing feature / an error on my part or perhaps a bug? If indeed a missing feature, and I'm sure no small effort - would you accept a PR? The desired behavior here would be to simply return the prev/next/nearest slice.
Alternative Solutions
I didn't see any other functions in the API that would work around this - but perhaps there's an approach here that I'm missing?
Additional Context
Thanks for all the hard work on Pandas!