Closed yoyololicon closed 1 year ago
MIDI encoding function interface after today's quick discussions with @Ningzhi-Wang @nicolaus625 :
def encode_segment_midi(midi: Any, segment_length: float = 5.12, output_size: int = 2048):
"""
Returns:
torch.Tensor or numpy.ndarray: (N, output_size)
"""
MIDI encoding function interface after today's quick discussions with @Ningzhi-Wang @nicolaus625 :
def encode_segment_midi(midi: Any, segment_length: float = 5.12, output_size: int = 2048): """ Returns: torch.Tensor or numpy.ndarray: (N, output_size) """
Need to change the signature to include both midi filename and wave filename as the total time in midi file and wave file actually differs and needs to use wave file total time for alignments.
def encode_segment_midi(midi: String, wave: String, frame_rate: int=50,
segment_length: int=256, output_size: int=2048):
"""
Inject calculation of frame size and number of frames per segment
outside the function for more flexible control.
Args:
frame_rate: number of frames per second.
segment_length: number of frames in each segment.
Returns:
torch.Tensor or numpy.ndarray: (N, output_size)
"""
> ```
Need to change the signature to include both midi filename and wave filename as the total time in midi file and wave file actually differs and needs to use wave file total time for alignments.
def encode_segment_midi(midi: String, wave: String, frame_rate: int=50, segment_length: int=256, output_size: int=2048): """ Inject calculation of frame size and number of frames per segment outside the function for more flexible control. Args: frame_rate: number of frames per second. segment_length: number of frames in each segment. Returns: torch.Tensor or numpy.ndarray: (N, output_size) """ > ```
I don't recommend doing this because it mixes a lot of operations into one function. In my opinion, reading wave files and segmenting them isn't necessary and can be happened somewhere else. A simple function that converts midi events into token segments is enough. This also helps us to apply it to different datasets more efficiently.
Finished the first implementation of the encoding function. Here is its signature:
def tokenize(filename, frame_rate, segment_length, output_size, step_rate=100):
""" Convert MIDI notes into integer tokens.
Notice that the total time in the MIDI and wave files may differ.
Wave samples of times beyond MIDI total time should be discarded.
Args:
filename: path to MIDI file.
frame_rate: number of frames per second.
segment_length: number of frames in each segment.
output_size: number of tokens per segment.
step_rate: number of MIDI frames per second.
Returns:
Tokenized and segmented tensor and start and end time of each segment.
(torch.Tensor : (N, output_size), torch.Tensor: (N, 2))
"""
Please refer to the tokenize
function in preprocessor/preprocessor.py for detail.
This function is still relatively slow so needs further optimization.
@Ningzhi-Wang could open a PR and assign me as a reviewer? I'll look into the code later.
Don't forget to merge the newest commits from master
.
Proposal from slack: https://qmul-rmri-2022.slack.com/archives/C043XFP0ZPG/p1668464934230699