neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
13.14k stars 1.81k forks source link

Use in games? #297

Open chaosdrop opened 1 year ago

chaosdrop commented 1 year ago

I'm curious as to if anyone knows if it would be reasonably legal to use output from tortoise in video game dialog?

neonbjb commented 1 year ago

Depends on if you buy the fair use argument or not. I think a precedent will probably be set in court soon enough.

neonbjb commented 1 year ago

Also obligatory: I am not a lawyer.

sbersier commented 1 year ago

For a game dialog, I would rather use random generated voices. I made a script that generates a random voice .pth file based on a seed (an integer). Note for those who might say: "random voices already exist!" Yes, they do. But what if you want to change the seed while keeping the voice?

So, it will generate a file named: randomvoice\.pth NOTE: it is supposed to generate voices for CUDA. If you want to generate voices for CPU then you'all have to change: device='cuda' to: device='cpu'

Then, just move the resulting .pth file to a sub-folder named random_voiceseed\ (for example) in your voices folder. Make sure that there is just this file (or possibly sub-folders) but nothing else. Save this script (and name it generate_random_voice.py) into tortoise-tts folder.

Usage: python generate_random_voice \ For example:

python generate_random_voice.py 42

#
# script: generate_random_voice.py
#
# Usage: python generate_random_voice <int:seed>
#

import sys
import torch

device='cuda'

from tortoise.api import TextToSpeech

seed=sys.argv[1]

tts = TextToSpeech()

torch.manual_seed(seed)
l1,l2=tts.get_random_conditioning_latents()
v=(l1.to(device), l2.to(device))
name='rand_voice_'+str(seed)+'.pth'
torch.save(v,name)

Note: In my opinion (I'm still discovering the whole thing...) the random generated voices are not as good as voices generated from audio clips from real speakers. But you'll see.

chaosdrop commented 1 year ago

Yeah I was thinking for the voice clips I'd actually get consent from family/friends, or for one off character lines just use random. Basically only use things other then mimicking existing voices to avoid anything on the lines of slander/defamation lawsuits, since even if things fall under public domain it wouldn't surprise me one bit if there were plenty of smaller cases lost to defamation claims if you did mimicking.

I was mainly wondering on the side of the actual model/software/output itself if there might be any legal thing to be aware of.

Seems like most everything attached to tortoise-tts falls in the Apache License 2.0 or public domain datasets except for the included voice clips based on real people voices.

chaosdrop commented 1 year ago

For a game dialog, I would rather use random generated voices. I made a script that generates a random voice .pth file based on a seed (an integer). Note for those who might say: "random voices already exist!" Yes, they do. But what if you want to change the seed while keeping the voice?

So, it will generate a file named: randomvoice.pth NOTE: it is supposed to generate voices for CUDA. If you want to generate voices for CPU then you'all have to change: device='cuda' to: device='cpu'

Then, just move the resulting .pth file to a sub-folder named random_voiceseed (for example) in your voices folder. Make sure that there is just this file (or possibly sub-folders) but nothing else. Save this script (and name it generate_random_voice.py) into tortoise-tts folder.

Usage: python generate_random_voice For example:

python generate_random_voice.py 42

#
# script: generate_random_voice.py
#
# Usage: python generate_random_voice <int:seed>
#

import sys
import torch

device='cuda'

from tortoise.api import TextToSpeech

seed=sys.argv[1]

tts = TextToSpeech()

torch.manual_seed(seed)
l1,l2=tts.get_random_conditioning_latents()
v=(l1.to('cuda'), l2.to('cuda'))
name='rand_voice_'+str(seed)+'.pth'
torch.save(v,name)

Note: In my opinion (I'm still discovering the whold thing...) the random generated voices are not as good as voices generated from audio clips from real speaker. But you'll see.

Thanks for this info it's actually quite handy to know.

sbersier commented 1 year ago

@chaosdrop Please take notes of the edits I made...

chaosdrop commented 1 year ago

@sbersier I'm still super new to python I only know a little bit from helping my nephew, but I do wonder since tortoise can combine voices together to get an average, do you think it would be possible to combine a public domain voice clip with a random voice to generate a unique voice .pth?

It might also be a good way to "target" a genre of voice since if you had a public domain female voice averaged with a random one it seems like you'd be more likely to get a female sounding random voice and if you mixed it with a deep male voice more likely to get a deeper male voice...etc.

chaosdrop commented 1 year ago

That's an interesting one to think about since the Libribox does have this on there website:

However They Wish?

What does "however they wish" mean, exactly? People may use our recordings to
profit; they may remix them into other projects; they do not need to give
credit to the individual reader/writer/creator or to !LibriVox. Anyone may do
all kinds of things with LibriVox recordings. Some we might "approve of," and
other things we might prefer them not do - but Public Domain means that just
about anyone can do what they like with the recordings.

Since they already acknowledge you can remix the readings into other projects and profit from them... they have already more or less given a pretty good acknowledgment that you could dice up books and use the sentences as game dialog. It's just a bizarre idea I never even considered... cutting up audiobooks to make game dialog kind of reminds me of old anime like Robotech where they cut up several animes and then recombined them into a new product.

Suppose as a volunteer that would be a good thing to be aware of, if you produce public domain audio people can resell it in ways you wouldn't have considered before, whereas if you volunteered for the NLS on the other hand they keep all the copyrights and the content is provided just for the disabled/impaired.

On 2/8/2023 at 2:18 PM, wrote:

"Do you think it would be possible to combine a public domain voice clip with a random voice to generate a unique voice .pth"

If you consider the LibriVox volonteers, for example, I think the license is clear enough: "The LibriVox recordings are in the public domain." And just that!... Does it mean you can clone LibriVox speakers voices? It is not said. I'm not even sure that they realize what is going on... From what I understand, the original idea was to bring culture to those who can't read (for any reason: being blind, doesn't know how to read, ...) or can't buy expensive books. The idea is different isn't it? So, would you condider your use as a fair use of volonteers's work?

sbersier commented 1 year ago

"Public Domain means that just about anyone can do what they like with the recordings"

With the recordings, yes. But not necessarily with their voices... That's different. You can do whatever you want with the recordings but it doesn't imply that you are allowed to use their voices for anything (and by anything... I really mean anything! Like... racist, homophobic, antisemitic or anti-swedish content) Personnaly, I would like institutions like LibriVox to make a statement, in accordance with their members, about their position about it. Because the question is: Did their members sign for that?

chaosdrop commented 1 year ago

Really this is a rather old thing not even new to TTS because it would fall under the same thing as making a 3D model that looks like a person without consent.

Right of Publicity has proven to be a rather sticky subject in most cases I can find you can really own win a Right of Publicity case if your likeness has celerity value and are a "natural living person".

Of course most of the cases where they won up it was very obviously the persons fame being taken advantage of. If you trained a voice off of data and especially mixed it with another random voice so that it didn't a truly reconcilable likeness of the person I doubt anyone other then super famous people could win a Right of Publicity case. Since that's equal to if you made a 3D model that looked like a famous person, but you change their hair color, eye color, name, personality, and used a different voice. Yes they have 60% in common with that person but no reasonable person is going to mistake them for the original.

sbersier commented 1 year ago

I am not a lawyer... We'll see...

chaosdrop commented 1 year ago

I don't know the more that I think about 3D models, cartoon characters, and just the art world in general I think it already largely has been decided. Artist have been using real world photos of people to come up with original characters since probably the dawn of art and you only run into legal issues when it's obvious that you are trying to profit off another persons fame or cause harm to that persons reputation . Eg: The places where celebrities win Right of Publicity cases and defamation cases.

AI art is more iffy because the AIs tend to copy too much of the original copyrighted artwork. AI voices alone would have a hard time doing anything like that by themselves, but if you mixed them with AI script writing then you could have some Opportunities for major issues as the right voice saying the wrong trade marked or copyrighted phrase could land you in big trouble.

Of course it doesn't even have to be an AI. As real people have got in trouble for sounding too much like a famous person and saying the wrong trademarked thing in a position where they profited off the fame of that person.

Anafeyka commented 1 year ago

I can tell you about the small case of MegaFon. Their advertising campaign consisted of several commercials with different subjects. Something like a mini commercial series. One of the characters was Bruce Willis. The idea was that they picked up a similar actor and then they did a deepFake Bruce Willis on that actor. But they were buying the rights to use the image of the actor. And for several uses the amount of the deal was in the area of 1-2 million dollars (this information is classified, but based on other cases, we can assume that the amount of the deal was in this area). In fact, the voice is also copyrighted. Basically you have 3 options. 1: Hire an actor directly to voice the dialogue. 2: Clone the voice and get the rights to use that voice from the actor's official representatives. 3: Clone the voice and hope you don't get dragged to court by the actor's lawyers.