Open cifkao opened 3 years ago
Yes, the current implementation of remove_duplicate
only works as expected if the sort
method sorts objects with all attributes, which is not the case for now. I agree that making MusPy objects comparable would be complicated and misleading in some way.
I think simple equality check is sufficient as we can do this independently for each time step by finding all objects that have the same time
and compare them against one another.
However, we might need a policy (possibly given as an argument) to decide which object, the former one or the latter one, to keep. For example, if we have tempos = [Tempo(time=0, qpm=100), Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)]
, we probably want to keep the last one, but the result can be either [Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)]
or [Tempo(time=0, qpm=100)]
, which depends on user's preference.
I think simple equality check is sufficient as we can do this independently for each time step by finding all objects that have the same time and compare them against one another.
Right, that would work.
However, we might need a policy (possibly given as an argument) to decide which object, the former one or the latter one, to keep. For example, if we have tempos = [Tempo(time=0, qpm=100), Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)], we probably want to keep the last one, but the result can be either [Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)] or [Tempo(time=0, qpm=100)], which depends on user's preference.
Hmm, I wasn't even thinking of this case, what I had in mind were exactly duplicate objects that don't get sorted next to each other because there is some other object at the same time step which will end up between them.
I think simple equality check is sufficient as we can do this independently for each time step by finding all objects that have the same time and compare them against one another.
Actually, this doesn't work for objects that don't have a time attribute (e.g. tracks). Right now remove_duplicate
will consider such objects, but there is no way to sort them.
I think the high-level idea is to do it for each time step if an object has a time attribute, otherwise do it for all list items.
The
remove_duplicate
method has two problems:sort
method is not sufficient for this purpose, because it only sorts by time. To catch all duplicates, one would need to sort by all attributes (including non-primitive ones, which are not comparable). (For example duplicate tracks will get removed iff they are adjacent.)Since making all MusPy objects (of the same type) comparable in a meaningful way would be complicated, the only reasonable solution seems to be to remove duplicates without sorting. For that, it would be sufficient to make objects hashable. Or JSON strings can be used as keys for a quick solution.