kundajelab / bpnet

Toolkit to train base-resolution deep neural networks on functional genomics data and to interpret them
http://bit.ly/bpnet-colab
MIT License
141 stars 33 forks source link

BPNet fails with wrong cudnn version, but tensorflow doesn't. #8

Open mmtrebuchet opened 4 years ago

mmtrebuchet commented 4 years ago

Hoo boy. Got a rough one. I'm trying to run BPNet on chemical mapping data, and it gets to epoch one before it crashes. It doesn't even crash cleanly. There's a segfault, and control returns to the terminal, but several bpnet processes continue to exist though they don't seem to be doing anything. A killall bpnet is necessary to stop it. Logs are attached, with tensorflow complaining about driver versions. But it gets worse than just the wrong version of the drivers. Because the simple Tensorflow tutorial succeeds. Included in the file is a testTensorflow.py file that executes correctly. This leads me to think that the problem is actually not a problem in tensorflow configuration, but rather an insidious bug in BPNet itself. problemRun.zip

Good luck! Let me know if you need me to test anything.

snystrom commented 3 years ago

I had a similar issue and fixed by installing tensorflow-gpu==1.8 (assuming you're using a GPU). Conda seems to pull the wrong CuDnn with v1.7.