kr-colab / locator

deep learning prediction of geographic location from individual genome sequences
Other
46 stars 18 forks source link

Error running sample dataset #19

Closed peterdfields closed 4 years ago

peterdfields commented 4 years ago

I'm seeing the following error when trying to run the sample dataset:

python ./scripts/locator.py --vcf data/test_genotypes.vcf.gz --sample_data data/test_sample_data.txt --out out/test/test
2020-10-22 03:15:08.805782: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-10-22 03:15:08.805833: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
reading VCF
[read_vcf] 11527 rows in 0.61s; chunk in 0.61s (18976 rows/s)
[read_vcf] all done (18974 rows/s)
loaded (11527, 500, 2) genotypes

filtering SNPs
running on 5830 genotypes after filtering

WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
2020-10-22 03:15:15.603126: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-10-22 03:15:15.603179: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-10-22 03:15:15.603222: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (evo-martes.zoo.unibas.ch): /proc/driver/nvidia/version does not exist
2020-10-22 03:15:15.603956: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-10-22 03:15:15.614517: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2399940000 Hz
2020-10-22 03:15:15.616203: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5590dadbb360 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-22 03:15:15.616233: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Epoch 1/5000
 9/13 [===================>..........] - ETA: 0s - loss: 1.4870
Epoch 00001: val_loss improved from inf to 0.68994, saving model to out/test/test_weights.hdf5
Traceback (most recent call last):
  File "./scripts/locator.py", line 384, in <module>
    history,model=train_network(model,traingen,testgen,trainlocs,testlocs)
  File "./scripts/locator.py", line 276, in train_network
    callbacks=[checkpointer,earlystop,reducelr])
  File "/home/peter/miniconda2/envs/locator/lib/python3.7/site-packages/tensorflow-2.3.1-py3.7-linux-x86_64.egg/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/peter/miniconda2/envs/locator/lib/python3.7/site-packages/tensorflow-2.3.1-py3.7-linux-x86_64.egg/tensorflow/python/keras/engine/training.py", line 1137, in fit
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/peter/miniconda2/envs/locator/lib/python3.7/site-packages/tensorflow-2.3.1-py3.7-linux-x86_64.egg/tensorflow/python/keras/callbacks.py", line 412, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/peter/miniconda2/envs/locator/lib/python3.7/site-packages/tensorflow-2.3.1-py3.7-linux-x86_64.egg/tensorflow/python/keras/callbacks.py", line 1249, in on_epoch_end
    self._save_model(epoch=epoch, logs=logs)
  File "/home/peter/miniconda2/envs/locator/lib/python3.7/site-packages/tensorflow-2.3.1-py3.7-linux-x86_64.egg/tensorflow/python/keras/callbacks.py", line 1299, in _save_model
    filepath, overwrite=True, options=self._options)
  File "/home/peter/miniconda2/envs/locator/lib/python3.7/site-packages/tensorflow-2.3.1-py3.7-linux-x86_64.egg/tensorflow/python/keras/engine/training.py", line 2085, in save_weights
    hdf5_format.save_weights_to_hdf5_group(f, self.layers)
  File "/home/peter/miniconda2/envs/locator/lib/python3.7/site-packages/tensorflow-2.3.1-py3.7-linux-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 640, in save_weights_to_hdf5_group
    param_dset = g.create_dataset(name, val.shape, dtype=val.dtype)
  File "/home/peter/miniconda2/envs/locator/lib/python3.7/site-packages/h5py-3.0.0rc1-py3.7-linux-x86_64.egg/h5py/_hl/group.py", line 143, in create_dataset
    if '/' in name:
TypeError: a bytes-like object is required, not 'str'

I get this error when I try to set locator up both on a mac running Catalina and a Linux machine running openSUSE. During setup step I do notice a few oddities, mostly concerning version compatibilities with tensorflow, e.g.:

tensorflow 2.3.1 requires h5py<2.11.0,>=2.10.0, but you'll have h5py 3.0.0rc1 which is incompatible.

Still, even when I install all the dependencies individually using pip or conda I'm still getting this particular error. I'm using a conda environment (python=3.7). Please let me know if additional information would be useful for figuring out what might be going wrong on my side.

cjbattey commented 4 years ago

Thanks for reporting this! A recent update to h5py broke tensorflow compatibility and I hadn't set an explicit version requirement in the setup script, which I've now fixed. If you clone the repo and follow the github install instructions (i.e. with setup.py instead of pip), it should work. Alternately, you can try downgrading h5py in your locator environment with pip install h5py==2.10.0. Let me know if you have any other issues, otherwise closing for now.

peterdfields commented 4 years ago

@cjbattey That did the trick. thanks!