neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
12.9k stars 1.78k forks source link

Finetunining for better Results? #182

Closed Maki9009 closed 1 year ago

Maki9009 commented 1 year ago

So I assume, the collab isn't actually finetuning. This makes sense because for some voices I can get close enough for it sounds good. But like let's say my voice. I have no idea why but it keeps making me sound like a Posh British man.

Any way to improve the cloning of somevoices?

Randy-H0 commented 1 year ago

Screenshot_2023-01-25-11-17-36-213_com android chrome 100% accuracy on the results and voices folder. 100%!!! (Yes false positives but I'm sure that's not common)

jnordberg commented 1 year ago

Jesus christ.... I wish GitHub would charge everyone $5 per comment

Randy-H0 commented 1 year ago

Jesus christ.... I wish GitHub would charge everyone $5 per comment

Psss, I know a secret, unsubscribe from this thread

a-ggghost commented 1 year ago

That's the worst analogy I've heard this year. The author have stated his reasons, he doesn't want his name on a thing that can be used maliciously and partnered with a company that has resources to assure it is used for good. And perhaps more importantly they are on the hook if something happens.

You don't have to agree with that but at least try to understand. If his goal was to make money this discussion wouldn't happen because the code and model checkpoints would never have been released in the first place.

All you are doing here is discouraging him and other researchers from releasing more cool stuff in the future.

Try to understand what? Primitive accumulation? Enclosure? Your whole presentation of "good" and "malicious" here hinges on some classist nonsense wherein access to resources is a barometer for righteousness. Keeping proprietary things more proprietarier is just threatening us with the status quo. IP is a travesty. That should be consensus by now especially after the past two decades.

Whatever small amount of profit there is for the author personally is barely even relevant. The issue is y'all leaning on this myth of capitalist meritocracy to try to post-hoc dream a little dream of "ethics" that might justify this decision.

If Nestle want to use the voice of one of their child labor contract workers to soften their image, it's all on the up-and-up but, if a kid got the bright idea to use the tech to defame some rich imperialist snot? JAIL! Respectively very good and very malicious, indeed. Much ethics.

deviandice commented 1 year ago

Jesus christ.... I wish GitHub would charge everyone $5 per comment

I'll take your $5 if you want to spend it that badly

neonbjb commented 1 year ago

Open sourcing this

Every single line of source code that is/was used to train and perform inference on Tortoise has been open sourced. What I have not done is document how to fine-tune the models. I also have not released the dataset; primarily because I do not want to get sued.

If the folks in this thread spent as much time just trying to actually learn how these things worked as they did complaining about how I didn't provide a golden path for them to build their business on top of, they would have already figured this problem out. I regularly speak with (and help!) others who have done so.

This is more the intent behind my ethical stance that you so deride: It's not that I want to hold fine-tuning back from the world. It's that I do not want my name attached with a open source project like https://github.com/iperov/DeepFaceLab which makes it extremely easy to do really shady shit.

Randy-H0 commented 1 year ago

Open sourcing this

Every single line of source code that is/was used to train and perform inference on Tortoise has been open sourced. What I have not done is document how to fine-tune the models. I also have not released the dataset; primarily because I do not want to get sued.

If the folks in this thread spent as much time just trying to actually learn how these things worked as they did complaining about how I didn't provide a golden path for them to build their business on top of, they would have already figured this problem out. I regularly speak with (and help!) others who have done so.

This is more the intent behind my ethical stance that you so deride: It's not that I want to hold fine-tuning back from the world. It's that I do not want my name attached with a open source project like https://github.com/iperov/DeepFaceLab which makes it extremely easy to do really shady shit.

It's not your fault if someone uses your project for bad intentions. Is it possible to fine-tune the model on unseen data? If so, could you give us instructions on how we can do so?

deviandice commented 1 year ago

If the folks in this thread spent as much time just trying to actually learn how these things worked as they did complaining about how I didn't provide a golden path for them to build their business on top of

My best friend died 6 months ago of cancer and I want to hear his voice again. My girlfriends father in law was murdered and her family would like to hear his voice again. In this case I want to fine tune it so it sounds real. What part of that has anything to do with business?

I understand that you made this choice out of self-preservation, but currently your name is needlessly being tarnished by claiming it falsely as an ethical choice whilst disagreeing any criticism of this. The way you've responded lacks empathy, perspective and is full of negative assumptions of others. People are more upset about your attitude towards them than the issue itself, that really what you want?

Maki9009 commented 1 year ago

Jesus christ.... I wish GitHub would charge everyone $5 per comment

if they paid me $5 for starting this thread... damn could have made above $200 then I could have payed for play.ht , tbh I just don't want to sound like British posh man lol, but the model already sounds good with females who aren't even British... but what can we do there are ethics, life, and cheese and in the end, it becomes open-sourced... except nukes, but chatgpt can teach you that... well it used to be able to teach u that. Eithway A.I. will be used for everything, so don't murder each other on this thread... I get all the emails.... and literally posted this in October I didn't expect a Warfield

Randy-H0 commented 1 year ago

Maybe Uberduck is interested in buying this

ghost commented 1 year ago

If the folks in this thread spent as much time just trying to actually learn how these things worked as they did complaining about how I didn't provide a golden path for them to build their business on top of

My best friend died 6 months ago of cancer and I want to hear his voice again. My girlfriends father in law was murdered and her family would like to hear his voice again. In this case I want to fine tune it so it sounds real. What part of that has anything to do with business?

I understand that you made this choice out of self-preservation, but currently your name is needlessly being tarnished by claiming it falsely as an ethical choice whilst disagreeing any criticism of this. The way you've responded lacks empathy, perspective and is full of negative assumptions of others. People are more upset about your attitude towards them than the issue itself, that really what you want?

Honestly this is a good reason to never allow this to be fine tuned X_X

Randy-H0 commented 1 year ago

If the folks in this thread spent as much time just trying to actually learn how these things worked as they did complaining about how I didn't provide a golden path for them to build their business on top of

My best friend died 6 months ago of cancer and I want to hear his voice again. My girlfriends father in law was murdered and her family would like to hear his voice again. In this case I want to fine tune it so it sounds real. What part of that has anything to do with business? I understand that you made this choice out of self-preservation, but currently your name is needlessly being tarnished by claiming it falsely as an ethical choice whilst disagreeing any criticism of this. The way you've responded lacks empathy, perspective and is full of negative assumptions of others. People are more upset about your attitude towards them than the issue itself, that really what you want?

Honestly this is a good reason to never allow this to be fine tuned X_X

Are you mental!?

deviandice commented 1 year ago

Are you mental!?

Mental illness is no joke. They're probably autistic, don't worry about it.

devilismyfriend commented 1 year ago

Damn the entitlement in this comment section lol

Dude made a cool solution, say thanks he even shared parts of it and stfu.

SAI, MS and more OS devs will surpass Tortoise anyway and you'll get your anime waifu voice simulator, chill.

Randy-H0 commented 1 year ago

Damn the entitlement in this comment section lol

Dude made a cool solution, say thanks he even shared parts of it and stfu.

SAI, MS and more OS devs will surpass Tortoise anyway and you'll get your anime waifu voice simulator, chill.

That's not even the point. You're on GitHub where most things here are open source. And this dude is advertising his trainable model online for 45USD per month!

devilismyfriend commented 1 year ago

Damn the entitlement in this comment section lol Dude made a cool solution, say thanks he even shared parts of it and stfu. SAI, MS and more OS devs will surpass Tortoise anyway and you'll get your anime waifu voice simulator, chill.

That's not even the point. You're on GitHub where most things here are open source. And this dude is advertising his trainable model online for 45USD per month!

First of all, it's not 45USD a month to fine-tune a model, it's not even in their features for the tier, they do offer a fine-tuning option but it's via chat and I assume comes with an increase in costs.

Second, the compute requirement to fine-tune this model isn't readily available, even if he did release it you won't be able to use it on consumer hardware.

Third, it's his code, be thankful he shared what he did as it can lead proper devs to new and better models, ones that could one day be fine-tuned on consumer hardware.

I know it sucks but it is what it is, what you are doing tho is acting like an entitled child, if I were him I'd take the repo offline just to spite you.

drgrib commented 1 year ago

I don't think he has any obligation to make anything free that he doesn't want to. If he wants to offer this as a service for $50 a month, that is his right. He never had to take the time to put this on Github at all if he didn't want to and I'm grateful that he did.

That said, I would never pay that and I don't think most people would. It is an insane amount of money to pay for what, for most of us, is probably just a hobby or a side project. It is also ridiculous for many of us because we already have the computing power of an Nvidia graphics card on our personal machines and it sounds like most of the cost is for compute power. It would be nice if there were an option that didn't involve paying for compute power that we don't need.

If the folks in this thread spent as much time just trying to actually learn how these things worked as they did complaining about how I didn't provide a golden path for them to build their business on top of, they would have already figured this problem out.

I don't know how fair of an assessment this is. I have a PhD in Computer Science and work as a software engineer but I didn't specialize in machine learning. I emailed someone in the issues sections of your repo about how to fine tune the results and they told me I would have to train my own DVAE (a process I can literally only find academic papers for, not something straightforward like a Github repo) and that I would need 40k hours of audio data. That sounds crazy to me.

For me it is just frustrating to have this working so close to perfect but then glitching out in extremely strange and jarring ways in the middle of otherwise perfect audio because it isn't fine-tuned.

Randy-H0 commented 1 year ago

Damn the entitlement in this comment section lol Dude made a cool solution, say thanks he even shared parts of it and stfu. SAI, MS and more OS devs will surpass Tortoise anyway and you'll get your anime waifu voice simulator, chill.

That's not even the point. You're on GitHub where most things here are open source. And this dude is advertising his trainable model online for 45USD per month!

First of all, it's not 45USD a month to fine-tune a model, it's not even in their features for the tier, they do offer a fine-tuning option but it's via chat and I assume comes with an increase in costs.

Second, the compute requirement to fine-tune this model isn't readily available, even if he did release it you won't be able to use it on consumer hardware.

Third, it's his code, be thankful he shared what he did as it can lead proper devs to new and better models, ones that could one day be fine-tuned on consumer hardware.

I know it sucks but it is what it is, what you are doing tho is acting like an entitled child, if I were him I'd take the repo offline just to spite you.

There's a solution to that finetuning problem. It's called using Google colab. Obviously there are more people using Google colab than using their own machines to do machine learning things. Everyone's using dreambooth on Google colab, most people are using cloud computing for stable diffusion, have you ever considered thinking about that?

Randy-H0 commented 1 year ago

I'm mostly on your side, it's his choice what he does with the source code and stuff, but to 99% of people, charging that much monthly is just too expensive. You'd pay for hosting and finetuning but you'd likely be able to fine-tune it for yourself for under 50 dollars, without it costing you that monthly.

Thanks for being neutral

devilismyfriend commented 1 year ago

Damn the entitlement in this comment section lol Dude made a cool solution, say thanks he even shared parts of it and stfu. SAI, MS and more OS devs will surpass Tortoise anyway and you'll get your anime waifu voice simulator, chill.

That's not even the point. You're on GitHub where most things here are open source. And this dude is advertising his trainable model online for 45USD per month!

First of all, it's not 45USD a month to fine-tune a model, it's not even in their features for the tier, they do offer a fine-tuning option but it's via chat and I assume comes with an increase in costs. Second, the compute requirement to fine-tune this model isn't readily available, even if he did release it you won't be able to use it on consumer hardware. Third, it's his code, be thankful he shared what he did as it can lead proper devs to new and better models, ones that could one day be fine-tuned on consumer hardware. I know it sucks but it is what it is, what you are doing tho is acting like an entitled child, if I were him I'd take the repo offline just to spite you.

There's a solution to that finetuning problem. It's called using Google colab. Obviously there are more people using Google colab than using their own machines to do machine learning things. Everyone's using dreambooth on Google colab, most people are using cloud computing for stable diffusion, have you ever considered thinking about that?

You need proper hardware to fine-tune this model as per the creator, you can't do that on colab with your free tier so it's irrelevant, fine-tuning this model can likely cost around 500$ at the very least in terms of compute, probably more.

Randy-H0 commented 1 year ago

Damn the entitlement in this comment section lol Dude made a cool solution, say thanks he even shared parts of it and stfu. SAI, MS and more OS devs will surpass Tortoise anyway and you'll get your anime waifu voice simulator, chill.

That's not even the point. You're on GitHub where most things here are open source. And this dude is advertising his trainable model online for 45USD per month!

First of all, it's not 45USD a month to fine-tune a model, it's not even in their features for the tier, they do offer a fine-tuning option but it's via chat and I assume comes with an increase in costs. Second, the compute requirement to fine-tune this model isn't readily available, even if he did release it you won't be able to use it on consumer hardware. Third, it's his code, be thankful he shared what he did as it can lead proper devs to new and better models, ones that could one day be fine-tuned on consumer hardware. I know it sucks but it is what it is, what you are doing tho is acting like an entitled child, if I were him I'd take the repo offline just to spite you.

There's a solution to that finetuning problem. It's called using Google colab. Obviously there are more people using Google colab than using their own machines to do machine learning things. Everyone's using dreambooth on Google colab, most people are using cloud computing for stable diffusion, have you ever considered thinking about that?

You need proper hardware to fine-tune this model as per the creator, you can't do that on colab with your free tier so it's irrelevant, fine-tuning this model can likely cost around 500$ at the very least in terms of compute.

And that's why it costs 45USD monthly, so people can be subscribed for a whole year just for one voice clone to be paid back for

devilismyfriend commented 1 year ago

Damn the entitlement in this comment section lol Dude made a cool solution, say thanks he even shared parts of it and stfu. SAI, MS and more OS devs will surpass Tortoise anyway and you'll get your anime waifu voice simulator, chill.

That's not even the point. You're on GitHub where most things here are open source. And this dude is advertising his trainable model online for 45USD per month!

First of all, it's not 45USD a month to fine-tune a model, it's not even in their features for the tier, they do offer a fine-tuning option but it's via chat and I assume comes with an increase in costs. Second, the compute requirement to fine-tune this model isn't readily available, even if he did release it you won't be able to use it on consumer hardware. Third, it's his code, be thankful he shared what he did as it can lead proper devs to new and better models, ones that could one day be fine-tuned on consumer hardware. I know it sucks but it is what it is, what you are doing tho is acting like an entitled child, if I were him I'd take the repo offline just to spite you.

There's a solution to that finetuning problem. It's called using Google colab. Obviously there are more people using Google colab than using their own machines to do machine learning things. Everyone's using dreambooth on Google colab, most people are using cloud computing for stable diffusion, have you ever considered thinking about that?

You need proper hardware to fine-tune this model as per the creator, you can't do that on colab with your free tier so it's irrelevant, fine-tuning this model can likely cost around 500$ at the very least in terms of compute.

And that's why it costs 45USD monthly, so people can be subscribed for a whole year just for one voice clone to be paid back for

idk what are you on about, the 45 tier doesn't include voice cloning, it's just to use their already-made voices, to fine-tune with them you need to talk to them directly and probably pay a hefty fee for the compute.

Randy-H0 commented 1 year ago

And that's why it costs 45USD monthly, so people can be subscribed for a whole year just for one voice clone to be paid back for

idk what are you on about, the 45 tier doesn't include voice cloning, it's just to use their already-made voices, to fine-tune with them you need to talk to them directly and probably pay a hefty fee for the compute.

Let me do some calculations. If it costed 5$ for 4x A100s per hour, it'd have to take 100 hours of 4 A100s to fine-tune one voice, don't believe that.

devilismyfriend commented 1 year ago

And that's why it costs 45USD monthly, so people can be subscribed for a whole year just for one voice clone to be paid back for

idk what are you on about, the 45 tier doesn't include voice cloning, it's just to use their already-made voices, to fine-tune with them you need to talk to them directly and probably pay a hefty fee for the compute.

Let me do some calculations. If it costed 5$ for 4x A100s per hour, it'd have to take 100 hours of 4 A100s to fine-tune one voice, don't believe that.

the author stated it took him almost a year of training on 8x3090s to get to where the model is now, to be clear, the biggest problem with this model is a general lack of knowledge and better training for the other parts, ElevenLabs likely just fine-tuned Tortoise further, when you upload samples you're not fine-tuning the model, it's simply doing latent conditioning, their models just knows more voices.

Randy-H0 commented 1 year ago

And that's why it costs 45USD monthly, so people can be subscribed for a whole year just for one voice clone to be paid back for

idk what are you on about, the 45 tier doesn't include voice cloning, it's just to use their already-made voices, to fine-tune with them you need to talk to them directly and probably pay a hefty fee for the compute.

Let me do some calculations. If it costed 5$ for 4x A100s per hour, it'd have to take 100 hours of 4 A100s to fine-tune one voice, don't believe that.

the author stated it took him almost a year of training on 8x3090s to get to where the model is now, to be clear, the biggest problem with this model is a general lack of knowledge and better training for the other parts, ElevenLabs likely just fine-tuned Tortoise further, when you upload samples you're not fine-tuning the model, it's simply doing latent conditioning, their models just knows more voices.

Since when did we start talking about eleven labs? They're using a completely different thing, they're not associated with this. But imagine you know nothing about voices, humans, how they talk, the emotions, so you study it for a while and a while. Now you're able to recreate voices from like 10 seconds, but not that good. You receive 5 minutes of voice data and it takes you about 5 hours to be able to speak like them.

Training a base model is NOT like training a model on top of a base model!

bmc84 commented 1 year ago

You can't just come in here, then tell us you won't be sharing the code, and then sell it to a company and advertise said company here

The owner of this repo built this from scratch using his own rig. He owes you nothing. He doesn't owe anyone anything here. If he wants to "come in here, tell us he won't be sharing the code, and sell it to a company" - he absolutely can & it's none of your business.

If I had made this & saw replies like yours, I'd be absolutely livid.

Perhaps instead of commenting here so often, you could actually be doing something useful with your time. Maybe learn how to make something like this, instead of whinging. Also I won't see any replies to this, so don't bother.