neonbjb / DL-Art-School

DLAS - A configuration-driven trainer for generative models
Apache License 2.0
137 stars 135 forks source link

The details of ctc code generation? #14

Closed MlWoo closed 11 months ago

MlWoo commented 1 year ago

The work is very impressive and thanks a lot. I'm following your tts work. Some modules are introduced into the pipelines, but the configuration of pipelines are coupled. Could you provide your training configuration and. public datasets of ctc code generation. The code is named as Wav2VecWrapper? Thank you again.

neonbjb commented 1 year ago

Hey, thanks for the kind words. I'm not really sure how to answer the question. What are you looking to train? A wav2vec2 model? While I configured DLAS to be able to do this, I wouldn't really recommend doing so. I'm assuming huggingface has a better way to do this type of training.

MlWoo commented 1 year ago

@neonbjb sorry for my unclear words. I want to train a model to generate ctc code. But it should be compatable with Tortorise. Your dvae is 25hz, but that of the mainstreaming (like huggingface)is 50hz or more. So they are conflicted. Moreover, the tortoise use your self-trained bpe. The public ctc with wav2vec has its own bpe tokenizer, too.

class Wav2VecWrapper(nn.Module):
    """
    Basic wrapper class that makes Wav2Vec2 usable by DLAS.
    """
    def __init__(self,
                 vocab_size=148,
                 basis_model='facebook/wav2vec2-large',
                 freeze_transformer=False,
                 output_wer=True,
                 checkpointing_enabled=True,
                 provide_attention_mask=False,
                 spec_augment=True,
                 remove_feature_extractor=False,
                 ramp_dropout_mode=False,
                 ramp_dropout_end=20000,
                 ramp_dropout_min=.1,
                 ramp_dropout_max=.5,
                 layer_drop_pct=.1):

The code of your repo starts with the above. of course, wav2vec of huggingface or facebook is a good choice for basis_model. I want to figure out the other configuration of parameters of Wav2VecWrapper. could you provide them?