Merging objects - Githubissues

cifkao commented 3 years ago

This PR adds support for merging MusPy objects. One situation where this might be useful is when we want to merge events from two different tracks, maybe because they were in two different channels in the MIDI file, but logically they are the same track.

Changes:

Adding a Base.merge method which recursively merges an object into self. Works similarly to the MergeFrom method of protocol buffers. It also sorts and removes duplicates from lists by default. We can also just merge a single attribute if we want to.
Adding + and += operators that call merge (the former making a deep copy first).
Adding a __deepcopy__ implementation by converting to a dict and back (somewhat faster than the built-in way).
Fixing to_ordered_dict to create a deep copy of non-MusPy objects ~~(though I think this just means lists of primitive values, so they're shallow copies really)~~.
Making sort also sort primitive types.

Not sure if it's useful to be able to merge objects of different types, or if we should throw an exception in that case.

A problem with remove_duplicates (apart from #32): It's only defined for ComplexBase, so for example it doesn't work for metadata.creators. I had to add an exception in the tests because of this. That makes me wonder if sort and remove_duplicates should be moved to Base (since list attributes are not just a feature of ComplexBase).

salu133445 commented 3 years ago

Thank you for the pull request!

Some thoughts:

I think this should be reserved for ComplexBase (i.e., Music and Track). Merging two notes or two tempos doesn't make much sense to me.
It seems that non-list attributes are simply left unchanged. If so, it sounds more like a extend that works on corresponding list attributes between two objects.
Also, I am not quite sure about what override=False do and what the potential use cases are.
Using deepcopy in to_ordered_dict would slow down pretty_str and save. We could make it optional by adding a boolean deepcopy argument, which defaults to False (possibly?).
I don't think we should sort primitive types. In fact, the only list attributes that contains primitive type is Music.downbeats, which is not well supported at this point and we could probably figure out a better way to handle this.
Will x.merge(y, sort=True) produce the same result as x.merge(y).sort()? Similarly, will x.merge(y, remove_duplicates=True) produce the same result as x.merge(y).remove_duplicates()? If yes, I would prefer to make them more modularized if there is no other concerns other than memory issue.
For metadata.creators, I think the ordering does matter here , which is similar to the ordering of authors in a paper, so we probably don't want to sort them.
We could simply raise an error if the two objects are of different types.

cifkao commented 3 years ago

Everything you wrote makes sense to me.

I think this should be reserved for ComplexBase (i.e., Music and Track). Merging two notes or two tempos doesn't make much sense to me.

Yes, limiting this to list attributes does make sense. I was trying to mimic the behavior of protobuffers, but with the current schema I can't think of a good use case for this either. :slightly_smiling_face:

It seems that non-list attributes are simply left unchanged. If so, it sounds more like a extend that works on corresponding list attributes between two objects.

Also, I am not quite sure about what override=False do and what the potential use cases are.

Actually, the way it works now is that None values for optional non-list attributes get overridden by the other object. With override=True, all of them get overridden. But yeah, in reality this is probably not that useful.

I don't think we should sort primitive types. In fact, the only list attributes that contains primitive type is Music.downbeats, which is not well supported at this point and we could probably figure out a better way to handle this.

OK. The reason I cared about sorting numbers and strings was because of removing duplicates, but this will be done without sorting anyway, so I agree.

So since it looks like the only thing merge should do is extend all the list attributes (no sorting or removing duplicate items), maybe I should rename it to extend. And if a list is passed to it, it would call append on all items (in analogy with lists). And the same thing for the + and += operators. What do you think?

cifkao commented 3 years ago

Should this still (optionally) work recursively? E.g. when merging two Musics, should the metadata.creators also be merged?

salu133445 commented 3 years ago

Well. What if two music with different resolutions are to merge? Also, extending the time signatures doesn't sounds right to me. I feel like this only make sense for tracks within the same song, isn't it?

On the other hand, we should probably implement the method Base.extend similar to Base.append that automatically finds out the corresponding list to extend/append. Basically, what we are doing is to make Track behave like a list, where the actual operations apply directly to its list attributes.

Now, the key difference here will be the integrated supports for sort and remove_duplicates (which could also be done separately), and some copy/deepcopy issues.

cifkao commented 3 years ago

Well. What if two music with different resolutions are to merge? Also, extending the time signatures doesn't sounds right to me. I feel like this only make sense for tracks within the same song, isn't it?

Not necessarily. Let me explain the applications I have in mind.

First (most importantly), merging different tracks of the same song as you said. I imagine track1 + track2 or track1 += track2 should work (all the list attributes of track1 will be extended). And in the same vein, track1.extend(track2) should work the same way.

Second, merging two different music files. Let's say I have a music1 with a melody, tempos and time signatures, and I feed this to a 'harmonizer' algorithm that outputs a music2 with chords and a bass track (and nothing else). Now I want to merge these two, so I do music1 + music2 and get an object that has everything. Here it's my responsibility to make sure what I'm doing makes sense, i.e. I'm not combining incompatible or duplicate information.

Maybe the second example is a bit artificial. But in both cases it's a matter of convenience (not having to go through all the attributes and extend them one by one), so I thought it would be nice to have a general mechanism that can do this for any object.

So the logic would be:

ComplexBase.append finds which list to append to
ComplexBase.extend(x: list) finds which list to extend
ComplexBase.extend(x: ComplexBase) extends all lists (we can call this one merge or something else if you prefer)
As in lists, += does the same thing as extend, and + does the same but on a copy.

And for simplicity, it's probably best not to care about sorting or duplicates in either case and let the user do that themselves afterwards.

cifkao commented 3 years ago

Another application which is definitely not artificial is concatenating pieces of music. We can do this by time-shifting one of them and then merging it into the other one. In this case both of them can contain time signatures.

This is what note-seq does (using the MergeFrom method of protobuffers): https://github.com/magenta/note-seq/blob/484e854b76fb90328a8a42f78014ac3df9a92f50/note_seq/sequences_lib.py#L470

salu133445 commented 3 years ago

I see your point.

I would prefer to have this implemented in another method (maybe fill_missing or update_none?) as it's quite different from the concept of extend. We could always treat a class as a dictionary and implement an update method, which would be similar to self.__dict__.update(other.__dict__), but I don't know if this would be useful in practice. The tricky thing here is always whether to copy or deepcopy though.

Concating would be useful. Time shifting can be done by Base.adjust_time(lambda time: time + shift), so concating can be done by x.extend(y.adjust_time(lambda: time: time + x.get_end_time())). We could make shift a separate method.

I think we could simply leave non-list attributes unchanged, and implement ComplexBase.extend with ComplexBase input by extending each corresponding list. This would be easier to understand and document.

Sorry for having lots of questions on this. I once planned to implement something similar, but it's quite tricky to make it straightforward yet useful. Thank you for working on this!

cifkao commented 3 years ago

I think we could simply leave non-list attributes unchanged, and implement ComplexBase.extend with ComplexBase input by extending each corresponding list. This would be easier to understand and document.

I agree.

Sorry for having lots of questions on this. I once planned to implement something similar, but it's quite tricky to make it straightforward yet useful. Thank you for working on this!

Sure. Maybe I should have made my intended use cases clearer from the beginning. :slightly_smiling_face:

cifkao commented 3 years ago

I think this is ready. The copying is indeed tricky so let me know how you would like it to work.

salu133445 commented 3 years ago

I made the following changes.

Rename argument copy to deepcopy to avoid confusions
Change default value of deepcopy to False for consistency with list.extend
Avoid creating a new list when deepcopy=False
Implement __copy__ as well
Add methods ComplexBase.copy and ComplexBase.deepcopy

salu133445 commented 3 years ago

Well. It's not technically a view for sure. There's no other way to prevent creating a new list here as the items are OrderedDict now. Other than the OrderedDict objects themselves (and the lists that store them), the underlying data is the same. Let's put it this way - reduce the overhead as much as possible by default and provide a deepcopy arugument for supporting other usage.

Ondřej Cífka notifications@github.com 於 2020年12月31日週四下午8:03 寫道：

@cifkao commented on this pull request.

In muspy/base.py https://github.com/salu133445/muspy/pull/33#discussion_r550469002:

if attr in self._list_attributes:

List of non-MusPy objects

ordered_dict[attr] = deepcopy(value) if copy else list(value) else:

ordered_dict[attr] = value

Single non-MusPy object

ordered_dict[attr] = deepcopy(value) if copy else value

Here https://github.com/salu133445/muspy/blob/7fba527085140c2b07e22df182595bd6097bb4ae/muspy/base.py#L167-L169, a new list is created for each list attribute whose type is a subclass of Base. This new list is by no means a view of the original list; also all the newly created OrderedDicts are also not views of the original objects. Unless I am missing something.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/salu133445/muspy/pull/33#discussion_r550469002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFK27V3KNF6HNBU477MOOJLSXRSALANCNFSM4VKHNKDA .

salu133445 / muspy

Merging objects #33

@cifkao commented on this pull request.

List of non-MusPy objects

Single non-MusPy object