slp-rl / aero

This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)
MIT License
190 stars 24 forks source link

“assert len(self.hr_set) == len(self.lr_set)" error #10

Closed psp0001060 closed 11 months ago

psp0001060 commented 1 year ago

Hi Authors, When I run train.py, it prompts an error as below.

Traceback (most recent call last):
  File "/workspace/aero-main/train.py", line 135, in main
    _main(args)
  File "/workspace/aero-main/train.py", line 127, in _main
    run(args)
  File "/workspace/aero-main/train.py", line 54, in run
    tr_dataset = LrHrSet(args.dset.train, args.experiment.lr_sr, args.experiment.hr_sr,
  File "/home/wsd/workspace/aero-main/src/data/datasets.py", line 136, in __init__
    assert len(self.hr_set) == len(self.lr_set)
AssertionError
ERROR conda.cli.main_run:execute(49): `conda run python /workspace/aero-main/train.py dset=12-48 experiment=aero_12-48_512_256` failed.

prompt error : assert len(self.hr_set) == len(self.lr_set),then I print the length of both :

len(self.lr_set): 64349 len(self.hr_set): 64353 I read the code about the length of segment,The logic is to segment the audio based on different sampling rates.

class Audioset:
    def __init__(self, files=None, length=None, stride=None,
                 pad=True, with_path=False, sample_rate=None,
                 channels=None):
        """
        files should be a list [(file, length)]
        """
        self.files = files
        self.num_examples = []
        self.length = length
        self.stride = stride or length
        self.with_path = with_path
        self.sample_rate = sample_rate
        self.channels = channels

        for file, file_length in self.files:
            if length is None:
                examples = 1
            elif file_length < length:
                examples = 1 if pad else 0
            elif pad:
                examples = int(math.ceil((file_length - self.length) / self.stride) + 1)
            else:
                examples = (file_length - self.length) // self.stride + 1
            self.num_examples.append(examples)

I am using the same dataset as the author, which is VCTK (excluding p315 and s5). can you tell me How should this situation be handled?

m-mandel commented 1 year ago

Hi, I need some more info, usually this happens when the sampling rates in the conf file don't match the actual data. Did you change the sample rates in your experiment conf file to be 12000 (for LR) and 48000 (for HR)?

By the way, usually the speakers removed are p280 and p315 - as there were technical issues with their recording as mentioned here: "(However, two speakers, p280 and p315 had technical issues of the audio recordings using MKH 800)."

psp0001060 commented 1 year ago

Thank you for your reply. Since there is no 12-48 conf file provided in the GitHub repository, I created a 12-48 conf file based on the 4-16 conf file and modified “lr_sr: 12000, hr_sr: 48000, nfft: 512, hop_length: 256”. I have uploaded the aero_12-48_512_256.yaml file as an attachment(Since GitHub does not allow uploading files with the .yaml extension, I added the .txt extension.). aero_12-48_512_256.yaml.txt Also, thank you for reminding me that usually the speakers removed are p280 and p315. Since create_meta_files.py defines

TOTAL_N_SPEAKERS=108 
TRAIN_N_SPEAKERS=100 
TEST_N_SPEAKERS=8

, there are a total of 108 speakers, so I kept the data for p280 . Is it correct to delete p280 and p315 and keep s5?

psp0001060 commented 1 year ago

After testing, I found that the number of segments produced by the following four files is different, so I deleted these files and it can run normally and pass. p244_153_mic1.wav p250_393_mic1.wav p254_320_mic1.wav p263_258_mic1.wav

By the way, I deleted p280 and p315 and keep s5

m-mandel commented 11 months ago

Glad it worked out!