Use pytorch data loader for training

smguo commented 3 years ago

Dynamorph currently loads all training data to memory at once, and samples data for each mini-batch with a single process. Training could potentially be sped up using pytorch data loader , which supports multiprocessing and data augmentation.

One issue with adopting the data loader is the current matching loss implementation requires the batch to be sampled in certain order. This could possibly be achieved using Iterable-style datasets.

bryantChhun commented 3 years ago

@smguo Without an established CLI for VQ-VAE training ( of dynamorph data ) it's a little hard to insert a data loader. There are a couple candidate locations I could try. Let me know what you think:

Here in Michael's VQ-VAE training
or here in your CM training

In the meantime, I will try a generalized loader and test it against your CM data.

smguo commented 3 years ago

@bryantChhun Yes I agree on the point of the data loader should be built on the training CLI. I believe the version in master branch is outdated. We should merge @miaecle's current version with mine before working on data loader.

The dataloader loads the dataset (image files, not pickle files) from hard drive on the fly during training, so the whole loading dataset block and data structure would need to be re-written: https://github.com/czbiohub/dynamorph/blob/6269c55b95834603070fc139d71d615e2656fb51/run_training.py#L1281-L1309

And also the train function. One tricky part is to make matching loss work with data loader as I mentioned in the last post.

mehta-lab / dynamorph

Use pytorch data loader for training #15