spotify / basic-pitch-ts

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection.
https://basicpitch.io
Apache License 2.0
212 stars 15 forks source link

Any documentation available ? #10

Open aexposit opened 1 year ago

aexposit commented 1 year ago

Hi is there any documentation to start contributing to this interesting project ? My goal is to try adapt it to real time audio2midi. I would like to understand better the application flow with some description of the functions and input and output parameters without interpreting all the code. For example : 1) How do the OUTPUT_TO_TENSOR_NAME.frames, OUTPUT_TO_TENSOR_NAME.onsets, OUTPUT_TO_TENSOR_NAME.contours are related to note on , note off , pitchbend messages ? 2) What is melodiatrick ? 3) What is energy ? 4) What outputToNotesPoly function does ?

Do you have some performance numbers for the model.execute call in ms (single batch) ? Thx a lot

sherwyn33 commented 1 month ago

Its sortof documented in the code. But here is my understanding for question 2,3,4: frames: A frame activation matrix describes segments of audio analyzed for frequency content over time. Each frame in the matrix represents a specific time slice and its frequency data. This is crucial for tracking how sounds change over time within an audio file.

onsets: An onset activation matrix identifies the specific points in time when new notes begin. Each value in the matrix indicates the likelihood of a note starting at that time and frequency. Detecting onsets accurately is vital for correctly identifying the start of notes.

onsetThresh: This threshold sets the minimum amplitude of an onset activation that must be reached to consider it an actual onset of a note. This helps in filtering out false positives and ensuring that only significant note beginnings are recognized.

frameThresh: This threshold is used to determine whether a note should continue. If the amplitude of a frame activation drops below this level, it indicates that the note has ended or is too soft to be considered as continuing.

minNoteLen: This defines the minimum length a note must have to be recognized. This is measured in frames, not time directly, helping to prevent the recognition of very short, possibly erroneous notes.

inferOnsets: When this setting is true, the algorithm will automatically add onsets if there are large differences in frame amplitudes, suggesting a significant change in the audio that likely corresponds to a new note starting.

maxFreq and minFreq: These settings define the frequency range within which notes can be recognized. Frequencies outside this range will be ignored, which can be useful for filtering out noise or other unwanted audio components.

melodiaTrick: This involves a specific enhancement where semitones near a peak in frequency data are removed, presumably to clean up the data and avoid misinterpretation of pitches that are close to actual note peaks.

energyTolerance: This parameter allows a certain number of frames to drop below the threshold (potentially zero amplitude) without terminating the note. This can help in maintaining the continuity of notes through brief drops in sound level.