open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.28k stars 363 forks source link

[BUG]: 'NS2Trainer' object has no attribute '_count_parameters' #182

Open a897456 opened 3 months ago

a897456 commented 3 months ago

https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_trainer.py#L134-L136

Traceback (most recent call last): File "E:\00\Amphion-main_old\bins\tts\train.py", line 130, in main() File "E:\00\Amphion-main_old\bins\tts\train.py", line 104, in main trainer = build_trainer(args, cfg) File "E:\00\Amphion-main_old\bins\tts\train.py", line 26, in build_trainer trainer = trainer_class(args, cfg)#NS2Trainer File "E:\00\Amphion-main_old\models\tts\naturalspeech2\ns2_trainer.py", line 135, in init f"Model parameters: {self._count_parameters(self.model)/1e6:.2f}M" AttributeError: 'NS2Trainer' object has no attribute '_count_parameters'

a897456 commented 3 months ago

https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/egs/tts/NaturalSpeech2/exp_config.json#L11 I think it is false.

netagl commented 3 months ago

I changes the __count_parameters(model) in TTSTrainer class, to _count_parameters(model) @a897456

a897456 commented 3 months ago

I changes the __count_parameters(model) in TTSTrainer class, to _count_parameters(model) @a897456

Yes, _dump_cfg is also. And do you ever met : FileNotFoundError: [Errno 2] No such file or directory: 'data\libritts\code\19\train-clean-100#19#198#19_198_000004_000000.npy' in there: https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L187-L196

netagl commented 3 months ago

yes, also _dump_cfg.

yeah. @a897456 I changed self.cfg.preprocess.read_metadata: to False, and used acoustic_extractor to create this files:


            # code
            code = np.load(self.utt2code_path[utt])
            # frame_nums
            frame_nums = code.shape[1]
            # pitch
            pitch = np.load(self.utt2pitch_path[utt])
            # duration
            duration = np.load(self.utt2duration_path[utt])
            # phone_id
            phone_id = np.array(
                [
                    *map(
                        self.phone2id.get,
                        self.utt2phone[utt].replace("{", "").replace("}", "").split(),
                    )
                ]
            )
a897456 commented 3 months ago

yeah. @a897456 I changed self.cfg.preprocess.read_metadata: to False, and used acoustic_extractor to create this files:

used acoustic_extractor to create this files? How?

            # code
            code = np.load(self.utt2code_path[utt])
            # frame_nums
            frame_nums = code.shape[1]
            # pitch
            pitch = np.load(self.utt2pitch_path[utt])
            # duration
            duration = np.load(self.utt2duration_path[utt])
            # phone_id
            phone_id = np.array(
                [
                    *map(
                        self.phone2id.get,
                        self.utt2phone[utt].replace("{", "").replace("}", "").split(),
                    )
                ]
            )

This is the code in which part of if self.cfg.preprocess.read_metadata is false, so can you show the code of how use acoustic_extractor to create this files?

netagl commented 3 months ago

code:

        if cfg.preprocess.extract_acoustic_token:
            print('extract_acoustic_token')
            if cfg.preprocess.acoustic_token_extractor == "Encodec":
                codes = extract_encodec_token(wav_path)
                save_feature(
                    dataset_output, cfg.preprocess.acoustic_token_dir, uid, codes
                )

pitch:

        if cfg.preprocess.extract_pitch:
            pitch = f0.get_f0(wav, cfg.preprocess)
            save_feature(dataset_output, cfg.preprocess.pitch_dir, uid, pitch)

            if cfg.preprocess.extract_uv:
                assert isinstance(pitch, np.ndarray)
                uv = pitch != 0
                save_feature(dataset_output, cfg.preprocess.uv_dir, uid, uv)

phones:

from g2p_en import G2p
preprocess_english(res["Text"], lexicon, g2p)

@a897456

a897456 commented 3 months ago

THS,but :AttributeError: 'list' object has no attribute 'replace' https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L224-L230

netaglazer commented 3 months ago

Set True in cfg
@a897456


      assert cfg.preprocess.use_phone == True
        if cfg.preprocess.use_phone:
            self.utt2phone = {}
            for utt_info in self.metadata:
                dataset = utt_info["Dataset"]
                uid = utt_info["Uid"]
                utt = "{}_{}".format(dataset, uid)
                self.utt2phone[utt] = utt_info["phones"]
a897456 commented 3 months ago

Set True in cfg @a897456

      assert cfg.preprocess.use_phone == True
        if cfg.preprocess.use_phone:
            self.utt2phone = {}
            for utt_info in self.metadata:
                dataset = utt_info["Dataset"]
                uid = utt_info["Uid"]
                utt = "{}_{}".format(dataset, uid)
                self.utt2phone[utt] = utt_info["phones"]

Yes, and I changed the phone_id =... phone_id = np.array( [ *map( self.phone2id.get, self.utt2phone[utt].replace("{", "").replace("}", "").split(), ) ] )

a897456 commented 3 months ago

https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L308-L313 in this code: phone_nums =len(phone_id)=len(tensor(1,X))=1, so phone_nums always=1 because: phone_id = torch.from_numpy(phone_id).unsqueeze(0) so clip_phone_nums=1 but assert clip_phone_nums < phone_nums and clip_phone_nums >= 1 How to solve it,please?

a897456 commented 3 months ago

https://github.com/open-mmlab/Amphion/blob/d33551476d792e608c13cec1bfa32283c868a2fb/models/tts/naturalspeech2/ns2_dataset.py#L308-L313 in this code: phone_nums =len(phone_id)=len(tensor(1,X))=1, so phone_nums always=1 because: phone_id = torch.from_numpy(phone_id).unsqueeze(0) so clip_phone_nums=1 but assert clip_phone_nums < phone_nums and clip_phone_nums >= 1 How to solve it,please?

CreepJoye commented 1 month ago

Hi,@a897456 I meet the same problem and can't fix it,have you solved the problem? Any advice will be appreciated!!!

a897456 commented 1 month ago

Hi,@a897456 I meet the same problem and can't fix it,have you solved the problem? Any advice will be appreciated!!!

我在群里看到你问了,这个BUG作者应该还没修复。

CreepJoye commented 1 month ago

Hi,@a897456 I meet the same problem and can't fix it,have you solved the problem? Any advice will be appreciated!!!

我在群里看到你问了,这个BUG作者应该还没修复。 方便的话可以在群里加个微信吗,想交流学习一下

chazo1994 commented 1 month ago

@CreepJoye and @a897456 Have you fix all these bugs?

CreepJoye commented 1 month ago

Have you fix all these bugs? No, I made some changes but there are still some issues. I'm working on finding a solution. Do you have any thoughts?

CreepJoye commented 1 month ago

@CreepJoye and @a897456 Have you fix all these bugs?

@chazo1994 Have you fix all these bugs? I have been modifying the code, but new issues keep arising. If it's convenient, could we exchange contact information to discuss NS2 training?

chazo1994 commented 1 month ago

@CreepJoye Not yet, I have fixed a lot of bug, but there is still an error in the code extract (Encodec) which may not be implemented. I push my code in this fork:https://github.com/chazo1994/Amphion

You can contact me with my email thinhnv1811@gmail.com or my linkedin: https://www.linkedin.com/in/thinh-nguyen-a06658133/ or any platform that you used such as discord. I would be honored if we could discuss Neuralspeech2, Neuralspeech3 or any SOTA Speech generation model.