yoyololicon / music-spectrogram-diffusion-pytorch

72 stars 4 forks source link

midi tokenization encoder #10

Closed yoyololicon closed 1 year ago

yoyololicon commented 1 year ago

Proposal from slack: https://qmul-rmri-2022.slack.com/archives/C043XFP0ZPG/p1668464934230699

Ningzhi-Wang commented 1 year ago
yoyololicon commented 1 year ago

MIDI encoding function interface after today's quick discussions with @Ningzhi-Wang @nicolaus625 :

def encode_segment_midi(midi: Any, segment_length: float = 5.12, output_size: int = 2048):
    """
    Returns:
        torch.Tensor or numpy.ndarray: (N, output_size)
    """
Ningzhi-Wang commented 1 year ago

MIDI encoding function interface after today's quick discussions with @Ningzhi-Wang @nicolaus625 :

def encode_segment_midi(midi: Any, segment_length: float = 5.12, output_size: int = 2048):
    """
    Returns:
        torch.Tensor or numpy.ndarray: (N, output_size)
    """

Need to change the signature to include both midi filename and wave filename as the total time in midi file and wave file actually differs and needs to use wave file total time for alignments.


def encode_segment_midi(midi: String, wave: String, frame_rate: int=50, 
                        segment_length: int=256, output_size: int=2048):
    """
    Inject calculation of frame size and number of frames per segment 
    outside the function for more flexible control.
    Args:
        frame_rate: number of frames per second.
        segment_length: number of frames in each segment.
    Returns:
        torch.Tensor or numpy.ndarray: (N, output_size)
    """
> ```
yoyololicon commented 1 year ago

Need to change the signature to include both midi filename and wave filename as the total time in midi file and wave file actually differs and needs to use wave file total time for alignments.

def encode_segment_midi(midi: String, wave: String, frame_rate: int=50, 
                        segment_length: int=256, output_size: int=2048):
    """
    Inject calculation of frame size and number of frames per segment 
    outside the function for more flexible control.
    Args:
        frame_rate: number of frames per second.
        segment_length: number of frames in each segment.
    Returns:
        torch.Tensor or numpy.ndarray: (N, output_size)
    """
> ```

I don't recommend doing this because it mixes a lot of operations into one function. In my opinion, reading wave files and segmenting them isn't necessary and can be happened somewhere else. A simple function that converts midi events into token segments is enough. This also helps us to apply it to different datasets more efficiently.

Ningzhi-Wang commented 1 year ago

Finished the first implementation of the encoding function. Here is its signature:

def tokenize(filename, frame_rate, segment_length, output_size, step_rate=100):
    """ Convert MIDI notes into integer tokens.
    Notice that the total time in the MIDI and wave files may differ.
    Wave samples of times beyond MIDI total time should be discarded.
    Args:
        filename: path to MIDI file.
        frame_rate: number of frames per second.
        segment_length: number of frames in each segment.
        output_size: number of tokens per segment.
        step_rate: number of MIDI frames per second.
    Returns:
        Tokenized and segmented tensor and start and end time of each segment.
        (torch.Tensor : (N, output_size), torch.Tensor: (N, 2))
    """

Please refer to the tokenize function in preprocessor/preprocessor.py for detail. This function is still relatively slow so needs further optimization.

yoyololicon commented 1 year ago

@Ningzhi-Wang could open a PR and assign me as a reviewer? I'll look into the code later. Don't forget to merge the newest commits from master.