kundajelab / bpnet

Toolkit to train base-resolution deep neural networks on functional genomics data and to interpret them
http://bit.ly/bpnet-colab
MIT License
142 stars 35 forks source link

Datasets not loading correctly unless `--in-memory` specified. #4

Closed mlweilert closed 5 years ago

mlweilert commented 5 years ago

When training BPNet, if you (1) do not specify --in-memory and also (2) have a --config input that is different from the bpnet9 premade config file, the data does not load properly and the process freezes right before training the model. Everything loads correctly until that point. All CPU and GPU usage also crashes.

Avsecz commented 5 years ago

Hm. That's strange. It could be that one of the processes crashed. Does it happen even if you set num-workers to 1? Could you try installing pytorch?

mlweilert commented 5 years ago

Good call! Setting it to num-workers = 1 fixed it! Any idea why?

Avsecz commented 5 years ago

Somehow the parallel workers got stuck. Do you have pytorch installed or not?

mlweilert commented 5 years ago

Yes, version 1.2.0, build py3.6_cuda10.0.130_cudnn7.6.2_0. Should I try updating it?

Avsecz commented 5 years ago

You could try removing pytorch from the environment and then a numpy version of it will be used. It's due to the following data-loader issue in pytorch: https://github.com/pytorch/pytorch/issues/1355

mlweilert commented 5 years ago

I uninstalled pytorch from the environment and it worked nicely. Thanks for helping me trace the issue! I will make sure to set up the environment based on these discussions in the AWS AMI. After the AMI is created, I'll push a change to the README.md to the public link to the AMI.