qiuqiao / SOFA

SOFA: Singing-Oriented Forced Aligner
MIT License
116 stars 17 forks source link

SOFA inference #29

Open Arseny5 opened 1 month ago

Arseny5 commented 1 month ago


Dear developers, thank you so much for your aligner SOFA. Do you have some inference scripts with SOFA?

qiuqiao commented 1 month ago

Thank you for reaching out and showing interest in SOFA!

The infer.py script is used for inference tasks, and its usage is documented in the README.md file.

Could you please provide more details on what you're trying to achieve or what information you're missing? This will help me better address your needs.

Arseny5 commented 1 month ago

Thank you for reaching out and showing interest in SOFA!

The infer.py script is used for inference tasks, and its usage is documented in the README.md file.

Could you please provide more details on what you're trying to achieve or what information you're missing? This will help me better address your needs.

I don't understand what data needs to the input in order to make a competent inference. I saw that we are using the --folder segments_path flag, where segments_path is the path to our data on which we want to apply SOFA. As I saw from the repository's readme, the data should consist of wav and lab files.

However, my problem is that I only have wav files. I don't have any lab files. And I don't quite understand how to get them. Moreover, I do not understand exactly what the file should look like. As far as I know, the pipeline is that we get the text transcription from wav. Next, we use g2p to get phonemes, and then we write these phonemes in the lab. After that, we use the SOFA.

Could you please share at least a few samples of what wav + lab files should ideally look like, or maybe share scripts on how to get these lab files from wav in order to further make SOFA inference on them?

qiuqiao commented 1 month ago

It appears that the description in the README.md is not clear enough, leading to a misunderstanding of the inference process. In fact, the g2p conversion is part of the SOFA inference process, so the lab files should record the text transcription of the corresponding wav files.

I will update the README.md and add a few examples of wav + lab files later.

Arseny5 commented 1 month ago

It appears that the description in the README.md is not clear enough, leading to a misunderstanding of the inference process. In fact, the g2p conversion is part of the SOFA inference process, so the lab files should record the text transcription of the corresponding wav files.

I will update the README.md and add a few examples of wav + lab files later.

Oh, it's very useful for me right now. Thank you very much!

Arseny5 commented 1 month ago

It appears that the description in the README.md is not clear enough, leading to a misunderstanding of the inference process. In fact, the g2p conversion is part of the SOFA inference process, so the lab files should record the text transcription of the corresponding wav files.

I will update the README.md and add a few examples of wav + lab files later.

And maybe you have a script for how to get a starter lab file with wav transcription?