Open fury88 opened 1 month ago
To resolve the AttributeError: 'SynthesizerTrn' object has no attribute 'enc_p'
, ensure that the n_speakers
parameter is not set to 0 when initializing the SynthesizerTrn
object. The enc_p
attribute is only defined if n_speakers
is not 0.
Here's the relevant part of the SynthesizerTrn
class:
class SynthesizerTrn(nn.Module):
def __init__(
self,
n_vocab,
spec_channels,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
n_speakers=256,
gin_channels=256,
zero_g=False,
**kwargs
):
super().__init__()
self.dec = Generator(
inter_channels,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=gin_channels,
)
self.enc_q = PosteriorEncoder(
spec_channels,
inter_channels,
hidden_channels,
5,
1,
16,
gin_channels=gin_channels,
)
self.flow = ResidualCouplingBlock(inter_channels, hidden_channels, 5, 1, 4, gin_channels=gin_channels)
self.n_speakers = n_speakers
if n_speakers == 0:
self.ref_enc = ReferenceEncoder(spec_channels, gin_channels)
else:
self.enc_p = TextEncoder(n_vocab,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout)
self.sdp = StochasticDurationPredictor(hidden_channels, 192, 3, 0.5, 4, gin_channels=gin_channels)
self.dp = DurationPredictor(hidden_channels, 256, 3, 0.5, gin_channels=gin_channels)
self.emb_g = nn.Embedding(n_speakers, gin_channels)
self.zero_g = zero_g
Ensure that n_speakers
is not 0 when you initialize the SynthesizerTrn
object:
synthesizer = SynthesizerTrn(
n_vocab=100, # example value
spec_channels=80, # example value
inter_channels=192, # example value
hidden_channels=192, # example value
filter_channels=768, # example value
n_heads=2, # example value
n_layers=6, # example value
kernel_size=3, # example value
p_dropout=0.1, # example value
resblock="1", # example value
resblock_kernel_sizes=[3, 7, 11], # example value
resblock_dilation_sizes=[[1, 3, 5], [1, 3, 5], [1, 3, 5]], # example value
upsample_rates=[8, 8, 2, 2], # example value
upsample_initial_channel=512, # example value
upsample_kernel_sizes=[16, 16, 4, 4], # example value
n_speakers=256, # ensure this is not 0
gin_channels=256, # example value
zero_g=False # example value
)
Additionally, ensure that your configuration file correctly sets the n_speakers
parameter. For example, in the OpenVoiceBaseClass
initialization:
class OpenVoiceBaseClass(object):
def __init__(self,
config_path,
device='cuda:0'):
if 'cuda' in device:
assert torch.cuda.is_available()
hps = utils.get_hparams_from_file(config_path)
model = SynthesizerTrn(
len(getattr(hps, 'symbols', [])),
hps.data.filter_length // 2 + 1,
n_speakers=hps.data.n_speakers,
**hps.model,
).to(device)
model.eval()
self.model = model
self.hps = hps
self.device = device
Ensure that hps.data.n_speakers
is not set to 0 in your configuration file. This should resolve the AttributeError
[1][2].
To continue talking to Dosu, mention @dosu.
@dosu, ok I may have the n_speakers in the wrong place. I assume 'n_speakers' is tied to the 'speakers' array? Why not just use the length of the array?
Yes, the n_speakers
parameter in the SynthesizerTrn
object is tied to the length of the 'speakers' array, and it can be set using the length of this array. The n_speakers
parameter is used to initialize the nn.Embedding
layer, which requires the number of speakers to be specified [1].
You can set n_speakers
using the length of the 'speakers' array like this:
n_speakers = len(speakers_array)
synthesizer = SynthesizerTrn(
n_vocab=100, # example value
spec_channels=80, # example value
inter_channels=192, # example value
hidden_channels=192, # example value
filter_channels=768, # example value
n_heads=2, # example value
n_layers=6, # example value
kernel_size=3, # example value
p_dropout=0.1, # example value
resblock="1", # example value
resblock_kernel_sizes=[3, 7, 11], # example value
resblock_dilation_sizes=[[1, 3, 5], [1, 3, 5], [1, 3, 5]], # example value
upsample_rates=[8, 8, 2, 2], # example value
upsample_initial_channel=512, # example value
upsample_kernel_sizes=[16, 16, 4, 4], # example value
n_speakers=n_speakers, # set using the length of the speakers array
gin_channels=256, # example value
zero_g=False # example value
)
This ensures that the n_speakers
parameter correctly reflects the number of speakers in your array.
To continue talking to Dosu, mention @dosu.
@dosu, I'm setting all of this in the configuration file, not directly in the python code. I would expect the configuration file is the proper way, otherwise we'd have to modify the code and put in a pull request if that needs a permanent change.
You can indeed set the n_speakers
parameter in the configuration file. This is the proper way to handle it without modifying the code directly. Ensure that your configuration file includes the n_speakers
parameter, and it is set to the length of your speakers array.
Here is an example of how to set it in the configuration file:
{
"n_vocab": 100,
"spec_channels": 80,
"inter_channels": 192,
"hidden_channels": 192,
"filter_channels": 768,
"n_heads": 2,
"n_layers": 6,
"kernel_size": 3,
"p_dropout": 0.1,
"resblock": "1",
"resblock_kernel_sizes": [3, 7, 11],
"resblock_dilation_sizes": [[1, 3, 5], [1, 3, 5], [1, 3, 5]],
"upsample_rates": [8, 8, 2, 2],
"upsample_initial_channel": 512,
"upsample_kernel_sizes": [16, 16, 4, 4],
"n_speakers": 256, // Set this to the length of your speakers array
"gin_channels": 256,
"zero_g": false
}
When the configuration file is loaded, the n_speakers
parameter will be used to initialize the SynthesizerTrn
object. Ensure that the n_speakers
value matches the number of speakers you intend to support.
The configuration file is parsed and loaded into the hps
object using the utils.get_hparams_from_file
function, and then passed to the SynthesizerTrn
object as follows:
class OpenVoiceBaseClass(object):
def __init__(self,
config_path,
device='cuda:0'):
if 'cuda' in device:
assert torch.cuda.is_available()
hps = utils.get_hparams_from_file(config_path)
model = SynthesizerTrn(
len(getattr(hps, 'symbols', [])),
hps.data.filter_length // 2 + 1,
n_speakers=hps.data.n_speakers,
**hps.model,
).to(device)
model.eval()
self.model = model
self.hps = hps
self.device = device
Ensure that your configuration file correctly sets the n_speakers
parameter to avoid any issues [1][2].
To continue talking to Dosu, mention @dosu.
@dosu, yes, then this goes back to having the code read the length of the array instead of having to put in the parameter. ;-)
Yes, the code can be modified to automatically read the length of the speakers array instead of manually setting the n_speakers
parameter in the configuration file. You can achieve this by changing the initialization of the SynthesizerTrn
model to use the length of the speakers
array from the hps
object.
Here is the modified code:
class OpenVoiceBaseClass(object):
def __init__(self,
config_path,
device='cuda:0'):
if 'cuda' in device:
assert torch.cuda.is_available()
hps = utils.get_hparams_from_file(config_path)
# Automatically set n_speakers based on the length of the speakers array
n_speakers = len(getattr(hps, 'speakers', []))
model = SynthesizerTrn(
len(getattr(hps, 'symbols', [])),
hps.data.filter_length // 2 + 1,
inter_channels=hps.model.inter_channels,
hidden_channels=hps.model.hidden_channels,
filter_channels=hps.model.filter_channels,
n_heads=hps.model.n_heads,
n_layers=hps.model.n_layers,
kernel_size=hps.model.kernel_size,
p_dropout=hps.model.p_dropout,
resblock=hps.model.resblock,
resblock_kernel_sizes=hps.model.resblock_kernel_sizes,
resblock_dilation_sizes=hps.model.resblock_dilation_sizes,
upsample_rates=hps.model.upsample_rates,
upsample_initial_channel=hps.model.upsample_initial_channel,
upsample_kernel_sizes=hps.model.upsample_kernel_sizes,
n_speakers=n_speakers,
gin_channels=hps.model.gin_channels,
zero_g=hps.model.zero_g,
).to(device)
model.eval()
self.model = model
self.hps = hps
self.device = device
In this modification, n_speakers
is set to the length of the speakers
array from the hps
object, ensuring that it is automatically determined based on the configuration file [1][2].
To continue talking to Dosu, mention @dosu.
@dosu, more missing parameters. Just FYI, I'm using the example so something is not right when you stray off the path. 'ref_enc' is another attribute missing.
To resolve the missing ref_enc
attribute error in the SynthesizerTrn
class, you need to ensure that the ref_enc
attribute is properly initialized. The current implementation initializes ref_enc
only if n_speakers
is 0. If n_speakers
is not 0, ref_enc
is not initialized, which leads to the missing attribute error.
To fix this, you should initialize ref_enc
regardless of the value of n_speakers
. Here is the corrected SynthesizerTrn
class:
class SynthesizerTrn(nn.Module):
"""
Synthesizer for Training
"""
def __init__(
self,
n_vocab,
spec_channels,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
n_speakers=256,
gin_channels=256,
zero_g=False,
**kwargs
):
super().__init__()
self.dec = Generator(
inter_channels,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=gin_channels,
)
self.enc_q = PosteriorEncoder(
spec_channels,
inter_channels,
hidden_channels,
5,
1,
16,
gin_channels=gin_channels,
)
self.flow = ResidualCouplingBlock(inter_channels, hidden_channels, 5, 1, 4, gin_channels=gin_channels)
self.n_speakers = n_speakers
self.ref_enc = ReferenceEncoder(spec_channels, gin_channels) # Initialize ref_enc regardless of n_speakers
if n_speakers != 0:
self.enc_p = TextEncoder(n_vocab,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout)
self.sdp = StochasticDurationPredictor(hidden_channels, 192, 3, 0.5, 4, gin_channels=gin_channels)
self.dp = DurationPredictor(hidden_channels, 256, 3, 0.5, gin_channels=gin_channels)
self.emb_g = nn.Embedding(n_speakers, gin_channels)
self.zero_g = zero_g
This change ensures that ref_enc
is always initialized, preventing the missing attribute error [1].
To continue talking to Dosu, mention @dosu.
This one is related to And this one is related to https://github.com/myshell-ai/OpenVoice/issues/290 I see some of the configuration entries from v1 are left out of v2. Adding them does the trick, but at the end now I'm getting python errors based rep code.
@dosu, AttributeError: 'SynthesizerTrn' object has no attribute 'enc_p'. Did you mean: 'enc_q'?