titinko / utsu

Vocal synthesis frontend
Other
184 stars 26 forks source link

Idea - NNSVS Support #69

Open SeleDreams opened 3 years ago

SeleDreams commented 3 years ago

It's an idea of something i'm thinking about working on but am not sure about the interest. I have the idea to integrate a secondary engine to utsu named NNSVS which is an engine for AI based vocal synthesis that is open source https://github.com/r9y9/nnsvs it doesn't support the same type of voicebanks as they require to be AI trained but i feel like having one editor to use multiple types of voicebanks would be great, even more for voicebanks of better quality AI trained

NNSVS currently doesn't have compatible editor so i felt like it would also help and bring more people to utsu and nnsvs as well, helping the open source community

SeleDreams commented 3 years ago

It would allow UTSU to become kind of the "AI UTAU" for the community where anyone could create AI vocals and make music with them

but before working on it i first wanted to know if the creator of utsu and other contributors want to see other vb types than just UTAUs supported as it's possible you only want UTAUs

LucasCTN commented 3 years ago

I think this is a great idea! Would it generate a voicebank for UTSU's resample, or it also works as a whole resampler?

adlez27 commented 3 years ago

This might be helpful: https://note.com/crazy_utau/n/n45db22b33d2c

SeleDreams commented 3 years ago

This might be helpful: https://note.com/crazy_utau/n/n45db22b33d2c

ENUNU is separate because it relies on the UTAU plugin api from what i remember it doesn't use the same system at all than utau in reality and relies on the data it can access to via plugins, that's why in the git it specifies the things it couldn't do due to the limitations of the plugin api

I think this is a great idea! Would it generate a voicebank for UTSU's resample, or it also works as a whole resampler?

it works as its own thing due to the difference of AI synthesis

LucasCTN commented 3 years ago

Reading about the software, it looks like the task would be then to make UTSU be able to export to MusicXML so NNSVS can interpret it, right? And maybe bundle it in if the license permits. This could be the open source version of Synthesizer V AI, which makes me very excited about!

SeleDreams commented 3 years ago

Reading about the software, it looks like the task would be then to make UTSU be able to export to MusicXML so NNSVS can interpret it, right? And maybe bundle it in if the license permits. This could be the open source version of Synthesizer V AI, which makes me very excited about!

it technically could but I think some small editing would be enough for it to directly get the data from UST files, all it needs is the data, the files it gets the data from aren't as important as long as there are deserializers

and yes this is one of the goals

SeleDreams commented 3 years ago

licence wise, nnsvs is MIT, there is 0 problems with bundling it as from what i can see UTSU is under an MIT compatible licence

titinko commented 3 years ago

Times like this remind me how much UTSU needs its own plugin framework.

If I understand correctly, integrating NNSVS with UTSU would have three parts to it.

Rendering songs in NNSVS: The easiest way is to write code converting the internal Song object into an NNSVS-readable file, then run NNSVS on that in the background.

Using NNSVS voicebanks: I could see UTSU's song editor being tweaked so that it pretends that NNSVS voicebanks are regular UTAU voicebanks on the frontend, but in the backend only renders them with NNSVS.

Creating NNSVS voicebanks: Since the format is completely different from UTAU's voicebanks, you'd have to write an entirely new voicebank editor UI.

SeleDreams commented 3 years ago

Times like this remind me how much UTSU needs its own plugin framework.

If I understand correctly, integrating NNSVS with UTSU would have three parts to it.

Rendering songs in NNSVS: The easiest way is to write code converting the internal Song object into an NNSVS-readable file, then run NNSVS on that in the background.

Using NNSVS voicebanks: I could see UTSU's song editor being tweaked so that it pretends that NNSVS voicebanks are regular UTAU voicebanks on the frontend, but in the backend only renders them with NNSVS.

Creating NNSVS voicebanks: Since the format is completely different from UTAU's voicebanks, you'd have to write an entirely new voicebank editor UI.

Yes, I imagine that the vb creation side would come last since it's not the priority, usage would come first and vb creation second (since creating AI vbs to begin with is much more complex due to the AI training phase)

LucasCTN commented 3 years ago

I wanna try making a serializer from the Song object to NNSVS. Looks like there's a need to convert it to HTS full-context label files (using Sinsy), and NNSVS makes a MusicXML to label file step to use it.

If i'm correct, does anyone knows any document with the specification of the data from HTS label files?

(For the little research that I did, it looks like it's easier to use MusicXML as a middle man and use pysinsy to do the final conversion to HTS)