mosaicml / diffusion

Apache License 2.0
673 stars 70 forks source link

ValueError('cannot mmap an empty file') #47

Open s5248 opened 1 year ago

s5248 commented 1 year ago

FileExistsError: [Errno 17] File exists: '/000000_shard_access_times'

During handling of the above exception, another exception occurred:

InstantiationException: Error in call to target 'diffusion.datasets.laion.laion.build_streaming_laion_dataloader': ValueError('cannot mmap an empty file') full_key: dataset.train_dataset ERROR:composer.cli.launcher:Rank 0 crashed with exit code 1. Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately. Global rank 0 (PID 34542) exited with code 1 ERROR:composer.cli.launcher:Global rank 0 (PID 34542) exited with code 1

I'm wondering why it need to write in the root directory?, it seems set self.local to empty, please help

Landanjs commented 1 year ago

Apologies for the delay! This might be from an ungraceful termination in the streaming process. Did you have have a previous run that failed?

Could you try streaming.base.util.clean_stale_shared_memory? This should remove any corrupted data