neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
12.84k stars 1.77k forks source link

Finetunining for better Results? #182

Closed Maki9009 closed 1 year ago

Maki9009 commented 1 year ago

So I assume, the collab isn't actually finetuning. This makes sense because for some voices I can get close enough for it sounds good. But like let's say my voice. I have no idea why but it keeps making me sound like a Posh British man.

Any way to improve the cloning of somevoices?

stargan2vc commented 1 year ago

The scripts won't be released sadly

indieshack commented 1 year ago

+1 for this. I have an actor that I'm good friends, he in his 90's now, who recorded narration for a documentary project of mine some years ago. The project content has changed somewhat and I'd like to clone his voice (with his permission of course, he's still very mentally alert and finds the tortoise project fascinating) as it was when he recorded the original narration. I get the concern to not allow high accuracy cloning for various reasons, I think the need outweighs the downsides - I'm quite sure technology like this is already being used by state agencies (probably including our own).

neonbjb commented 1 year ago

I partnered with a company to make this fine-tuning technology available for exactly what you are describing, @indieshack: https://play.ht/pricing/

It's not currently cheap to clone voices, but that's because it is quite costly to rent the amount of compute required to perform the initial fine-tuning process. I know play.ht is really interested in supporting use cases like yours, so you might consider contacting them directly and maybe they will set up some sort of one-time fee for you.

deviandice commented 1 year ago

I'm confused. You've stated many times you can't release the training & fine tuning code because it's unethical, but it was an ethical choice to release the training code to a specific company who makes money from it and gives you a cut?

Randy-H0 commented 1 year ago

That's what happens with greed. I was hoping this for to be open sourced and stuff. This just doesn't make sense. Creator is worried about unethical purposes yet they sell it all to a company!? Like come on, "supporting use cases like yours" or more interested in the money you provide? I feel like it's the latter one.

You can't just come in here, then tell us you won't be sharing the code, and then sell it to a company and advertise said company here

Randy-H0 commented 1 year ago

If you really felt like you wanted to support use cases like indieshack provided, then support them instead of slapping a paywall behind everything

indieshack commented 1 year ago

I partnered with a company to make this fine-tuning technology available for exactly what you are describing, @indieshack: https://play.ht/pricing/

It's not currently cheap to clone voices, but that's because it is quite costly to rent the amount of compute required to perform the initial fine-tuning process. I know play.ht is really interested in supporting use cases like yours, so you might consider contacting them directly and maybe they will set up some sort of one-time fee for you.

Thanks - took a look, all I can see are pre-cooked voices, I want to actually clone my friend's voice. BTW, in respect to other comments my view is that it's perfectly OK to make a few $$ out of programming. I think it's a great project with lots of potential.

altryne commented 1 year ago

play.ht is actually a cool service, and I assume @neonbjb chose them as they have a strong "personal" verification (they require you to record a consent using your own voice before cloning)

I'd also love for someone like @Uberduck to have access to this, @neonbjb did you consider licensing this to other places?

Randy-H0 commented 1 year ago

play.ht is actually a cool service, and I assume @neonbjb chose them as they have a strong "personal" verification (they require you to record a consent using your own voice before cloning)

I'd also love for someone like @Uberduck to have access to this, @neonbjb did you consider licensing this to other places?

Uberduck uses open source stuff, 99.99% of models are made by the community and it's all made using open source techniques, switching to tortoise would mean that the training code would need to be open sourced too, since The community, responsible of 99.99% of models, uses colab notebooks

neonbjb commented 1 year ago

I've withheld posting here for a bit to try and temper my anger at the community.

but it was an ethical choice to release the training code to a specific company who makes money from it and gives you a cut?

Yes. When people in my culture want to do something that is ethically risky and requires more work than any one person can invest, they generally do so behind a corporation. It's a mechanism to allow people to work together for shared profits and to shield individuals from personal risk. Ack that not all cultures and value systems work this way.

The people who run play.ht genuinely care about their users and want to make this kind of TTS available to everyone. If you truly believe that it is my moral imperative to releasing fine-tuning to the world, then I think this is the correct way to do it. Most people who would actually use such a system do not have access to the GPUs or knowhow required to fine-tune these models. By partnering with a company who can offer this as a service, I am able to allow considerably more people access to this feature than if I simply open sourced it. The exception to the former statement is that if, after open sourcing it, someone else started a company that did the same thing play.ht is doing.

That's what happens with greed. I was hoping this for to be open sourced and stuff. This just doesn't make sense. Creator is worried about unethical purposes yet they sell it all to a company!? Like come on, "supporting use cases like yours" or more interested in the money you provide? I feel like it's the latter one.

Give me a break, man. I didn't "sell it all to a company". I have made very little money from my arrangement with play.ht; basically just contract labor rates. I did get a small amount of stock, which is how I get to share in their success. Regardless, this isn't about money for me at all:

Your response to this is totally out of line. I am a long time contributor to open source (and open content in general on the web), and will continue to contribute. What have you contributed? How would you feel if I commented on your projects saying "you didn't give enough away, please give more"? Or "your ethical views are wrong and you don't deserve to make decisions about your creations"?

Randy-H0 commented 1 year ago

I've withheld posting here for a bit to try and temper my anger at the community.

but it was an ethical choice to release the training code to a specific company who makes money from it and gives you a cut?

Yes. When people in my culture want to do something that is ethically risky and requires more work than any one person can invest, they generally do so behind a corporation. It's a mechanism to allow people to work together for shared profits and to shield individuals from personal risk. Ack that not all cultures and value systems work this way.

The people who run play.ht genuinely care about their users and want to make this kind of TTS available to everyone. If you truly believe that it is my moral imperative to releasing fine-tuning to the world, then I think this is the correct way to do it. Most people who would actually use such a system do not have access to the GPUs or knowhow required to fine-tune these models. By partnering with a company who can offer this as a service, I am able to allow considerably more people access to this feature than if I simply open sourced it. The exception to the former statement is that if, after open sourcing it, someone else started a company that did the same thing play.ht is doing.

That's what happens with greed. I was hoping this for to be open sourced and stuff. This just doesn't make sense. Creator is worried about unethical purposes yet they sell it all to a company!? Like come on, "supporting use cases like yours" or more interested in the money you provide? I feel like it's the latter one.

Give me a break, man. I didn't "sell it all to a company". I have made very little money from my arrangement with play.ht; basically just contract labor rates. I did get a small amount of stock, which is how I get to share in their success. Regardless, this isn't about money for me at all:

  • I built Tortoise because I'm a nerd who likes to program and I have a particular interest in machine learning.
  • I open sourced it (despite the obvious commercial potential) because I have no interest in running a business and because money isn't really a huge motivating factor for me.
  • I withdrew from active development on it because I am extremely concerned with how it could be abused, and I do not want my name to be associated with the abuse that will happen if I fully release and document fine-tuning.
  • I partnered with play.ht because the community (here and through personal communication) convinced me that there was a need for this and partnering with a company that could control the technology seemed like a good avenue for making this happen.

Your response to this is totally out of line. I am a long time contributor to open source (and open content in general on the web), and will continue to contribute. What have you contributed? How would you feel if I commented on your projects saying "you didn't give enough away, please give more"? Or "your ethical views are wrong and you don't deserve to make decisions about your creations"?

The reason of open sourcing is that other people will be able to use your creations for themselves without having them to do it themselves. They get a gift from you and they're happy you gave it to them. The version of Tortoise that is open source is not bad but not usable either. Look at Nvidia, they made tacotron2 open source, stability made stable diffusion open source, OpenAi made jukebox open source, the world is improving in front of us. But OpenAi out gpt-3 behind a paywall, and they also put Dalle behind a paywall, did people like that even though there's literally a free trail? No! They gave backlash and the community also then invented other things like stability. This will happen to tortoise, not to spite you. Tortoise could've been used by a lot more people but when it's put behind a paywall of 45 USD per month, then no one's going to pay that really.

I haven't contributed much to the open source community because I can't code, I don't have a good GPU to train stuff and I also don't have the time to do so. I've tried my best to make a model for uberduck called CRUST, it's a multispeaker model trained on 20 hours and 168 speakers. It works good enough for me and I open sourced it for everyone to ever find it. People who can use it can train on as low as 30 seconds of data with reasonable results if the voice is "generic". Everyone can use it and that makes me happy. That also sparked another type of this model that's also open sourced.

The point is, make people happy without a paywall, way more people can enjoy it then

iamkhalidbashir commented 1 year ago

Subscribing just for the drama… ;)

iamkhalidbashir commented 1 year ago

@neonbjb regardless of your choice, we love you to opensource the model :) But It would be waaaaay better if you open-sourced the training code instead of model 😁 Anyways thanks.

iamkhalidbashir commented 1 year ago

Also I think VALL-E from microsoft uses the same concept that this repo uses ? Lets hope they release the codebase, I heard it will be at end of jan

Randy-H0 commented 1 year ago

Also I think VALL-E from microsoft uses the same concept that this repo uses ? Lets hope they release the codebase, I heard it will be at end of jan

Finally open source realistic voice cloning

deviandice commented 1 year ago

If you truly believe that it is my moral imperative to releasing fine-tuning to the world, then I think this is the correct way to do it.

You think it's a moral imperative not to release it, I just think it's not a good argument if money is involved. It's like listening to a John Deere spokesperson talking about right to repair is bad because a few people might get hurt, but at the same time charge through the roof to do it for you.

Most people who would actually use such a system do not have access to the GPUs or knowhow required to fine-tune these models. By partnering with a company who can offer this as a service, I am able to allow considerably more people access to this feature than if I simply open sourced it.

I'd say that's fine if this was a reddit thread, but this is a GitHub repository. The people who are coming here, to your repo, more than likely have the hardware and technically minded enough to want to run the code themselves. I saved up to get a 3090 so I could run this stuff and I'm sure plenty of others chose to do this too. People are genuinely interested in doing this themselves, you only need to look at other AI communities like with Stable Diffusion & Coqui to see how wrong that is. It's disappointing.

altryne commented 1 year ago

I've withheld posting here for a bit to try and temper my anger at the community.

@neonbjb Please don't get discouraged due to a few trolls who may not even use any of the things you post eventually.

This is obviously a troll trying to elicit reaction:

Finally open source realistic voice cloning

Given this is in the context of a closed source model that has not released any code or papers yet.

They gave backlash and the community also then invented other things like stability.

And this is just.. plain wrong. Community didn't invest "things like stability", stabilityAI is a for-profit company that released their model as open source, but just recently raised 101 million dollars in seed investment. That doesn't "just" happen as a "community".

Randy-H0 commented 1 year ago

I've withheld posting here for a bit to try and temper my anger at the community.

@neonbjb Please don't get discouraged due to a few trolls who may not even use any of the things you post eventually.

This is obviously a troll trying to elicit reaction:

Finally open source realistic voice cloning

Given this is in the context of a closed source model that has not released any code or papers yet.

They gave backlash and the community also then invented other things like stability.

And this is just.. plain wrong. Community didn't invest "things like stability", stabilityAI is a for-profit company that released their model as open source, but just recently raised 101 million dollars in seed investment. That doesn't "just" happen as a "community".

Thank you for joining in on the fun. I mean you can be mad and downvote this comment but that won't change much :/

deviandice commented 1 year ago

Please don't get discouraged due to a few trolls who may not even use any of the things you post eventually. This is obviously a troll trying to elicit reaction.

Outside of name calling, you're correct. I came here looking for a tools without paywalls I could use and fine tune for helping to test and create some highly tuneable game development & animation tools, equally without paywalls. I'll use what is released first, an alternative model with similar quality or tortoise with training code. The only reaction I wanted was for that justification to be put to rest for the benefit of myself and everyone else.

Maruiel commented 1 year ago

"You built my house for free, but how come it's not painted?? You're so selfish, it's basically useless to me right now"

deviandice commented 1 year ago

@Maruiel The house is already painted, but it's a dull grey. I want to paint it myself, I have the paint, but I'm not allowed a brush. The brush owner gave those guys over there all the brushes but they want $100 to paint the house.

jnordberg commented 1 year ago

The [free] house is already painted, but it's a dull grey. I want to paint it myself, I have the paint, but I'm not allowed a brush. The brush owner gave those guys over there all the brushes but they want $100 to paint the house

Here's where some people get creative and some people complain

Maki9009 commented 1 year ago

good morning everyone.. hows life

iamkhalidbashir commented 1 year ago

Life good, but my GPUs are sitting idle :( and that makes me sad....

Bigjuergo commented 1 year ago

my voice does not sound a lot like me when i trained it with 8x 10sec wav samples. what do i have to do to make my voice sound more like me? use more 10sec samples or longer samples? thank you!

Maki9009 commented 1 year ago

my voice does not sound a lot like me when i trained it with 8x 10sec wav samples.

what do i have to do to make my voice sound more like me?

use more 10sec samples or longer samples?

thank you!

Well this is technically the reason why people want the finetuning code open sourced... idk why but men sound very British.. with woman you can maybe get it close enough but still not perfect.

Randy-H0 commented 1 year ago

my voice does not sound a lot like me when i trained it with 8x 10sec wav samples. what do i have to do to make my voice sound more like me? use more 10sec samples or longer samples? thank you!

This is because the model runs on zero shot. The problem with this is is that it only gets the most "generic" voices according to tortoise itself and the training data. To fix this you need to train tortoise, buuuutt we can't because the creator only released their training code and model to a company which charges you 45 USD per month instead of open source

iamkhalidbashir commented 1 year ago

my voice does not sound a lot like me when i trained it with 8x 10sec wav samples. what do i have to do to make my voice sound more like me? use more 10sec samples or longer samples? thank you!

This is because the model runs on zero shot. The problem with this is is that it only gets the most "generic" voices according to tortoise itself and the training data. To fix this you need to train tortoise, buuuutt we can't because the creator only released their training code and model to a company which charges you 45 USD per month instead of open source

Randy, I want the open source code for training as much as you do but please stop bitching about moral here. Nothing immoral or wrong has been done by the author. Its his work he has the right to charge you or give you for free. So please don't make this twitter, it’s github.

iamkhalidbashir commented 1 year ago

No one has contributed to his training code, its his sole creation and he has free will to do whatever he wants with it.

Randy-H0 commented 1 year ago

Randy, I want the open source code for training as much as you do but please stop bitching about moral here. Nothing immoral or wrong has been done by the author. Its his work he has the right to charge you or give you for free. So please don't make this twitter, it github.

No one has contributed to his training code, its his sole creation and he has free will to do whatever he wants with it.

I only stated facts and none of my opinions in that comment, what are you getting at?

Maki9009 commented 1 year ago

I'd rather do a direct payment to the author to finetune my voice rather than a company... but not for $45 😂. I want at least a test sample that actually would sound like me before hand also

Randy-H0 commented 1 year ago

I'd rather do a direct payment to the author to finetune my voice rather than a company... but not for $45 😂. I want at least a test sample that actually would sound like me before hand also

I understand that and I wish it wasn't that expensive but things are like how they are now. Most we can do is wait for VALL-E that hopefully will be open sourced in the near future.

Conclusion to your thread: No, this is the best result you can get with tortoise. The training code/model has been sent to a company and now is put under a paywall. We're going to have to wait for VALL-E to come out for actual open source voice cloning with training.

Maki9009 commented 1 year ago

I don't think VAll E is coming out opensourced... Microsoft owns like half of openai. They won't allow it

Randy-H0 commented 1 year ago

I don't think VAll E is coming out opensourced... Microsoft owns like half of openai. They won't allow it

Well... Yeah I can't really argue with that, OpenAi doesn't want to release anything, but they did release point-e so who knows?

Maki9009 commented 1 year ago

Yeah point E is meh and I don't think they released the code to fine-tune it.

Randy-H0 commented 1 year ago

Yeah point E is meh and I don't think they released the code to fine-tune it.

🤷‍♂️

Bigjuergo commented 1 year ago

the author does a finetune for 45.- dollar for one voice?

Randy-H0 commented 1 year ago

the author does a finetune for 45.- dollar for one voice?

Not the author specifically, but the website who now has the finetuning model charges you 45 USD per month for it yeah

Randy-H0 commented 1 year ago

For one voice

Bigjuergo commented 1 year ago

per month is a little bit expensive!!! we should make an opensource crowdfund project to make this available for everyone.

Randy-H0 commented 1 year ago

per month is a little bit expensive!!! we should make an opensource crowdfund project to make this available for everyone.

LightAI is trying to do that... But there's like no progress. I have no experience in coding, let alone machine learning. But if you know people who have then maybe?

indieshack commented 1 year ago

I partnered with a company to make this fine-tuning technology available for exactly what you are describing, @indieshack: https://play.ht/pricing/

It's not currently cheap to clone voices, but that's because it is quite costly to rent the amount of compute required to perform the initial fine-tuning process. I know play.ht is really interested in supporting use cases like yours, so you might consider contacting them directly and maybe they will set up some sort of one-time fee for you.

Maybe I'm missing something, but I don't see the option to have the system learn your own custom voice, just a selection of voices (or wider selection if you pay more). Can you point me to where it provides that service please.

I-Have-No-Idea-What-IAmDoing commented 1 year ago

Should this be a discussion?

great drama btw

deviandice commented 1 year ago

Can you point me to where it provides that service please.

There is no service as you where told "consider contacting them directly and maybe they will set up some sort of one-time fee". This is your only option as this helps to "shield individuals from personal risk", the creator also has thinks you "do not have access to the GPUs or knowhow required to fine-tune these models.".

ghost commented 1 year ago

The people who run play.ht genuinely care about their users and want to make this kind of TTS available to everyone. If you truly believe that it is my moral imperative to releasing fine-tuning to the world, then I think this is the correct way to do it. Most people who would actually use such a system do not have access to the GPUs or knowhow required to fine-tune these models. By partnering with a company who can offer this as a service, I am able to allow considerably more people access to this feature than if I simply open sourced it. The exception to the former statement is that if, after open sourcing it, someone else started a company that did the same thing play.ht is doing.

I'm sorry but fine tuning Stable Diffusion has become so easy anyone with a double click and a web browser can do it so long as they have a gaming GPU (via LORA and Dreambooth and Hypernetworks built in support and extensions made from the community) - if tortoise was opensourced entirely the community would rapidly develop UX and tools to do so easily, probably after watching a 5min youtube tutorial on how to do it once the community developed the tools.. I don't find this a compelling argument but its your code regardless.

ghost commented 1 year ago

I'd rather do a direct payment to the author to finetune my voice rather than a company... but not for $45 😂. I want at least a test sample that actually would sound like me before hand also

I understand that and I wish it wasn't that expensive but things are like how they are now. Most we can do is wait for VALL-E that hopefully will be open sourced in the near future.

Conclusion to your thread: No, this is the best result you can get with tortoise. The training code/model has been sent to a company and now is put under a paywall. We're going to have to wait for VALL-E to come out for actual open source voice cloning with training.

It's very likely he signed a contractual obligation with Play.ht to not release the necessary data to train the model at this point, so, I don't think any of this conversation could move the needle anyways now

JigenD commented 1 year ago

@Maruiel The house is already painted, but it's a dull grey. I want to paint it myself, I have the paint, but I'm not allowed a brush. The brush owner gave those guys over there all the brushes but they want $100 to paint the house.

The guy who painted your house said he didn't want to let you paint it because he thought you were a criminal and would use the paint for huffing. But then he lets someone else sell his paint for money.

The whole AI industry has to stop resorting to calling 'money' 'ethics'.

jnordberg commented 1 year ago

That's the worst analogy I've heard this year. The author have stated his reasons, he doesn't want his name on a thing that can be used maliciously and partnered with a company that has resources to assure it is used for good. And perhaps more importantly they are on the hook if something happens.

You don't have to agree with that but at least try to understand. If his goal was to make money this discussion wouldn't happen because the code and model checkpoints would never have been released in the first place.

All you are doing here is discouraging him and other researchers from releasing more cool stuff in the future.

a-ggghost commented 1 year ago

@Maruiel The house is already painted, but it's a dull grey. I want to paint it myself, I have the paint, but I'm not allowed a brush. The brush owner gave those guys over there all the brushes but they want $100 to paint the house.

The guy who painted your house said he didn't want to let you paint it because he thought you were a criminal and would use the paint for huffing. But then he lets someone else sell his paint for money.

The whole AI industry has to stop resorting to calling 'money' 'ethics'.

Yes, when it's gatekept by capital, that's ethical! It's only those filthy, no-good poors who do crime. That's why we criminalize poverty and spend exorbitantly to suppress anything that threatens capital! None of the immoral things we do on the backs of the poors are crimes. It would be absurd for us make those things illegal for ourselves! We are very ethical, you see.

Randy-H0 commented 1 year ago

Yes, when it's gatekept by capital, that's ethical! It's only those filthy, no-good poors who do crime. That's why we criminalize poverty and spend exorbitantly to suppress anything that threatens capital! None of the immoral things we do on the backs of the poors are crimes. It would be absurd for us make those things illegal for ourselves! We are very ethical, you see.

This analogy is getting out of hand.

Here's a few bullet points;

  1. Open sourcing this won't end the world, it would be an advancement in technology and AI research
  2. People could modify the code to their needs and train a super realistic model for something like explanation videos
  3. This "AI will go in bad hands" thing is a dumb thing to worry about, there's a voice clone detector in this repo itself that you can use to detect tortoise voice clones.

When considering all these points, there's not a single reason why this repo shouldn't have been released, we can detect voice clones with this and we can also benefit from generating audiobooks, instruction videos.

This goes to show that the "ethics" thing we're all worried about has already been eliminated from the start, there's only one thing holding this back and it's the creator itself worrying about "ethics" even though they know that there's a bloody detector in the repo.