salu133445 / muspy

A toolkit for symbolic music generation
https://salu133445.github.io/muspy/
MIT License
427 stars 49 forks source link

Problems with remove_duplicate #32

Open cifkao opened 3 years ago

cifkao commented 3 years ago

The remove_duplicate method has two problems:

Since making all MusPy objects (of the same type) comparable in a meaningful way would be complicated, the only reasonable solution seems to be to remove duplicates without sorting. For that, it would be sufficient to make objects hashable. Or JSON strings can be used as keys for a quick solution.

salu133445 commented 3 years ago

Yes, the current implementation of remove_duplicate only works as expected if the sort method sorts objects with all attributes, which is not the case for now. I agree that making MusPy objects comparable would be complicated and misleading in some way.

I think simple equality check is sufficient as we can do this independently for each time step by finding all objects that have the same time and compare them against one another.

However, we might need a policy (possibly given as an argument) to decide which object, the former one or the latter one, to keep. For example, if we have tempos = [Tempo(time=0, qpm=100), Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)], we probably want to keep the last one, but the result can be either [Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)] or [Tempo(time=0, qpm=100)], which depends on user's preference.

cifkao commented 3 years ago

I think simple equality check is sufficient as we can do this independently for each time step by finding all objects that have the same time and compare them against one another.

Right, that would work.

However, we might need a policy (possibly given as an argument) to decide which object, the former one or the latter one, to keep. For example, if we have tempos = [Tempo(time=0, qpm=100), Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)], we probably want to keep the last one, but the result can be either [Tempo(time=0, qpm=120), Tempo(time=0, qpm=100)] or [Tempo(time=0, qpm=100)], which depends on user's preference.

Hmm, I wasn't even thinking of this case, what I had in mind were exactly duplicate objects that don't get sorted next to each other because there is some other object at the same time step which will end up between them.

cifkao commented 3 years ago

I think simple equality check is sufficient as we can do this independently for each time step by finding all objects that have the same time and compare them against one another.

Actually, this doesn't work for objects that don't have a time attribute (e.g. tracks). Right now remove_duplicate will consider such objects, but there is no way to sort them.

salu133445 commented 3 years ago

I think the high-level idea is to do it for each time step if an object has a time attribute, otherwise do it for all list items.