magenta / mt3

MT3: Multi-Task Multitrack Music Transcription
Apache License 2.0
1.43k stars 187 forks source link

Can I get a interpretation of what the dense layer output that has a shape of 1536 means? #40

Closed levscaut closed 2 years ago

levscaut commented 2 years ago

I know the 0-127 midi instrument program is included in this 1536, but what exactly does this region locate? I'm asking this because I want to apply a mask to the raw output to constrain the predicted types of instrument before softmax layer. So appreciated it if you can help me with this!

cghawthorne commented 2 years ago

You can see how the event ranges are assigned here: https://github.com/magenta/mt3/blob/main/mt3/vocabularies.py#L119

The first event range is reserved for time shifts, and we default to reserving 10 seconds' worth, at 100 steps/second, so the first 1000 events. In practice, this is more than we need, but we decided to err on the side of flexibility.

levscaut commented 2 years ago

Yes I have noticed these events ranges, but the overall range is 1388 for this vocabulary, is there any possibility to map this 1388 to the dense layer output that has dimension of 1536?

iansimon commented 2 years ago

The dimension is always increased to the nearest multiple of 128: https://github.com/magenta/mt3/blob/main/mt3/vocabularies.py#L282