TypeError: cannot unpack non-iterable NoneType object

tiansiyuan commented 1 year ago

Hi

When I the cell under SoundStream in notebook run audiolm_pytorch_demo.ipynb, I get:

TypeError Traceback (most recent call last) Cell In [5], line 19 6 trainer = SoundStreamTrainer( 7 soundstream, 8 folder = dataset_folder, (...) 14 num_train_steps = 9 15 ) #.cuda() 16 # NOTE: I changed num_trainsteps to 9 (aka 8 + 1) from 10000 to make things go faster for demo purposes 17 # adjusting save*_every variables for the same reason ---> 19 trainer.train()

File /opt/conda/lib/python3.8/site-packages/audiolm_pytorch/trainer.py:552, in SoundStreamTrainer.train(self, log_fn) 549 def train(self, log_fn = noop): 551 while self.steps < self.num_train_steps: --> 552 logs = self.train_step() 553 log_fn(logs) 555 self.print('training complete')

File /opt/conda/lib/python3.8/site-packages/audiolm_pytorch/trainer.py:420, in SoundStreamTrainer.trainstep(self) 417 # update vae (generator) 419 for in range(self.grad_accum_every): --> 420 wave, = next(self.dl_iter) 421 wave = wave.to(device) 423 loss, (recon_loss, multi_spectral_recon_loss, adversarial_loss, feature_loss, all_commitment_loss) = self.soundstream(wave, return_loss_breakdown = True)

TypeError: cannot unpack non-iterable NoneType object

How to solve this problem?

Thanks,

Tian

tiansiyuan commented 1 year ago

It was fine with 1.2.15 and 1.2.16.

tiansiyuan commented 12 months ago

Reproduced with 1.2.18.

LWprogramming commented 12 months ago

I've also been encountering this. It might be drop_last on dataloader when the batch_size doesn't perfectly divide your dataset size, considering the issue came up in 1.2.17, but I haven't gotten the chance to read how exactly drop_last interacts with things yet

lucidrains commented 12 months ago

@LWprogramming @tiansiyuan oh that would be strange? are you two seeing this when training on multiple gpus? i've turned it into an option; let me know if turning it off fixes it https://github.com/lucidrains/audiolm-pytorch/commit/d491046de3e4e24e191aa94f98f34bc4c337ac04

LWprogramming commented 12 months ago

yes on multiple gpus. i've been using encodec so i was seeing it in semantic/coarse/fine instead of soundstream since i dont end up training it. Can confirm that they train fine after turning it off now.

lucidrains commented 12 months ago

@LWprogramming ohh got it, thanks for confirming! this may be an issue with accelerate then

lucidrains commented 12 months ago

maybe worth exploring whether turning on split batches fixes this

LWprogramming commented 12 months ago

what are your thoughts on why it might allow for drop_last? Based on the source docs it just seems like it's yielding data differently but should still never be None. (From a purely selfish angle, for my personal training runs getting data in any order is fine by me :) there's no special structure to the data that needs to be shuffled away)

tiansiyuan commented 12 months ago

I have this issue with single GPU or with CPU.

tiansiyuan commented 12 months ago

Reproduced with 1.2.19 and 1.2.20.

tiansiyuan commented 12 months ago

Batch size only affects memory used in training and training speed?

seaniezhao commented 12 months ago

I just changed the batch size to run the demo without errors, but now it takes a very long time for 1 step in RTX4090.

seaniezhao commented 12 months ago

I just changed the batch size to run the demo without errors, but now it takes a very long time for 1 step in RTX4090.

I found it is because the valid dataset only contains 1 data, so I set valid_frac=0.0 just to run the demo

lucidrains commented 12 months ago

@seaniezhao thanks for clueing us in! @tiansiyuan i've added a few new error messages in an updated version

do you want to see if it triggers before starting training?

tiansiyuan commented 12 months ago

@seaniezhao thanks for clueing us in! @tiansiyuan i've added a few new error messages in an updated version

do you want to see if it triggers before starting training?

Yes，I verify with version 1.2.21 that it gives:

AssertionError: dataset must have sufficient samples for training

when I use dataset_folder = "placeholder_dataset"

When I switch to dataset_folder = "dev-clean", training works ok:

training with dataset of 2567 samples and validating with randomly splitted 136 samples ...... training complete

lucidrains commented 12 months ago

hurray, thank you @seaniezhao

lucidrains / audiolm-pytorch

TypeError: cannot unpack non-iterable NoneType object #212