Open groutr opened 3 years ago
@eriknw Can I get your thoughts on this?
Thanks @groutr! Everything here looks reasonable and good. I'm curious: do you have a use case for this?
And sorry for my delay. This year has been, uh, a little crazy.
I'm sure that I had a better use case when I created the PR that I cannot recall now.
One use case that currently comes to mind: when I'm asking "is this distinct", many times I'm really meaning to ask "why isn't this distinct"? If isdistinct
is False, it can be natural to wonder what the duplicated elements are. Pandas has duplicated and now toolz can also be used.
Yeah, that sounds reasonable.
@eriknw which name do you find easier to remember? toolz.duplicated
(toolz.duplicates
?) or toolz.nonunique
I think I prefer the name nonunique
as we don't produce a mask like pd.duplicated
.
I think this is ready. What do you think @eriknw?
itertoolz.unique
yields the never before seen elements of sequence.nonunique
is the complement, yielding the already seen elements of a sequence.This is incredibly useful for finding duplicates in a sequence.
This isn't really a new feature to itertoolz, but instead exposes an already existing feature.
isdistinct
already had this logic, but instead of returning True/False, I return the already seen elements as they are encountered. This PR simply moves the logic into its own function.ping: @eriknw