Closed levscaut closed 2 years ago
You can see how the event ranges are assigned here: https://github.com/magenta/mt3/blob/main/mt3/vocabularies.py#L119
The first event range is reserved for time shifts, and we default to reserving 10 seconds' worth, at 100 steps/second, so the first 1000 events. In practice, this is more than we need, but we decided to err on the side of flexibility.
Yes I have noticed these events ranges, but the overall range is 1388 for this vocabulary, is there any possibility to map this 1388 to the dense layer output that has dimension of 1536?
The dimension is always increased to the nearest multiple of 128: https://github.com/magenta/mt3/blob/main/mt3/vocabularies.py#L282
I know the 0-127 midi instrument program is included in this 1536, but what exactly does this region locate? I'm asking this because I want to apply a mask to the raw output to constrain the predicted types of instrument before softmax layer. So appreciated it if you can help me with this!