Closed francqz31 closed 11 months ago
Yes SOME is language-independent, but you still need .ds + .wav to train
@yqzhishen so SOME doesn't bring the Midi duration sequence ? just Midi sequence ?
At inference time SOME produces 3 outputs:
note_seq
, the MIDI pitch sequencenote_rest
, to indicate whether the note is a rest notenote_dur
, the MIDI duration sequence in secondsOk Fair enough Thanks again , I will reopen the issue if the training scripts get released and I had issue with it. I will be waiting since I want to train in English.
Hello again Mr. yqzhishen I did run python infer.py --model CKPT_PATH --wav WAV_PATH and I got this out: "accumulate_grad_batches: 1, audio_sample_rate: 44100, binarization_args: {'num_workers': 0, 'shuffle': True}, binarizer_cls: preprocessing.MIDIExtractionBinarizer, binary_data_dir: data/some_ds_fixmel_spk3_aug8/binary, clip_grad_norm: 1, dataloader_prefetch_factor: 2, ddp_backend: nccl, ds_workers: 4, finetune_ckpt_path: None, finetune_enabled: False, finetune_ignored_params: [], finetune_strict_shapes: True, fmax: 8000, fmin: 40, freezing_enabled: False, frozen_params: [], hop_size: 512, log_interval: 100, lr_scheduler_args: {'min_lr': 1e-05, 'scheduler_cls': 'lr_scheduler.scheduler.WarmupLR', 'warmup_steps': 5000}, max_batch_frames: 80000, max_batch_size: 8, max_updates: 10000000, max_val_batch_frames: 10000, max_val_batch_size: 1, midi_extractor_args: {'attention_drop': 0.1, 'attention_heads': 8, 'attention_heads_dim': 64, 'conv_drop': 0.1, 'dim': 512, 'ffn_latent_drop': 0.1, 'ffn_out_drop': 0.1, 'kernel_size': 31, 'lay': 8, 'use_lay_skip': True}, midi_max: 128, midi_min: 0, midi_num_bins: 256, midi_prob_deviation: 0.5, midi_shift_proportion: 0.0, midi_shift_range: [-6, 6], model_cls: modules.model.Gmidi_conform.midi_conforms, num_ckpt_keep: 5, num_sanity_val_steps: 1, num_valid_plots: 300, optimizer_args: {'beta1': 0.9, 'beta2': 0.98, 'lr': 0.0001, 'optimizer_cls': 'torch.optim.AdamW', 'weight_decay': 0}, pe: rmvpe, pe_ckpt: pretrained/rmvpe/model.pt, permanent_ckpt_interval: 40000, permanent_ckpt_start: 200000, pl_trainer_accelerator: auto, pl_trainer_devices: auto, pl_trainer_num_nodes: 1, pl_trainer_precision: 32-true, pl_trainer_strategy: auto, raw_data_dir: [], rest_threshold: 0.1, sampler_frame_count_grid: 6, seed: 114514, sort_by_len: True, task_cls: training.MIDIExtractionTask, test_prefixes: None, train_set_name: train, units_dim: 80, units_encoder: mel, units_encoder_ckpt: pretrained/contentvec/checkpoint_best_legacy_500.pt, use_buond_loss: True, use_midi_loss: True, val_check_interval: 4000, valid_set_name: valid, win_size: 2048 | load 'model' from '/content/SOME/model_steps_64000_simplified.ckpt'. 100% 1/1 [00:01<00:00, 1.84s/it] MIDI file saved at: '/content/SOME/202.mid'
** it converted the singing wav file into midi , Now how can I get the MIDI sequence , MIDI duration sequence of this midi file? what should I do ?
Thanks in advance!
You can use any editors or packages that support importing/extracting MIDI file format. But if you are able to read the code, you can get the raw outputs before the MIDI file is saved in infer.py
well the thing that unfortunately i have no idea how to do these 2 things , how can i get the raw outputs before the MIDI file is saved?
What are you using the MIDI file for?
Here midis
is the raw outputs.
https://github.com/openvpi/SOME/blob/e0ca1ed9b2e71bfeb2176bc51f9fa0469f3ea0de/infer.py#L37
I'm using the midi file to have a dataset like opencpop but in English, I already have a way to get the phoneme sequence and duration , and now I'm looking to get the Midi sequence and duration from SOME
Maybe this script can help, but it is not well-documented. You need to put it in your SOME directory, edit the parameters and options in the file, and run
Oh thanks so much , I edited the parameters and all: input_csv, out_csv, wav_folder , model_path and I got this csv_datas:1 success: 0 my result.csv looks just like my transcriptions.csv , (I put the English transcription of my wav file in transcriptions.csv)
@yqzhishen Hello it is me again I just noticed this project , like we discussed before, can this be used to get the " MIDI sequence | MIDI duration sequence" especially for English ?
like the one in opencpop as you said before "filename | lyrics | phoneme sequence | MIDI sequence | MIDI duration sequence | phoneme duration sequence | is slur sequence"
I'm not really looking for something to get me the "phoneme sequence | or the phoneme duration sequence or is slur" right now, just the Midi sequence and Midi duration sequence accurately that's what I need! so can SOME do that ?
Thanks in advance!
https://github.com/openvpi/DiffSinger/issues/29