kausmees / GenoCAE

Convolutional autoencoder for genotype data
BSD 3-Clause "New" or "Revised" License
15 stars 10 forks source link

Suggest: allow to work with GenoCAE without changing the working directory #10

Closed richelbilderbeek closed 3 years ago

richelbilderbeek commented 3 years ago

Dear GenoCAE maintainer,

Here I suggest to to allow a user to run GCAE from any folder, instead of forcing him/here to work from the GenoCAE folder.

When running the 'training' example code from the GenoCAE folder, the training works awesome:

Here I run the command:

richel@N141CU:~/.local/share/gcaer/gcae_v1_0$ /home/richel/.local/share/r-miniconda/envs/r-reticulate/bin/python \
  ~/.local/share/gcaer/gcae_v1_0/run_gcae.py train --datadir ~/.local/share/gcaer/gcae_v1_0/example_tiny/ \
  --data HumanOrigins249_tiny --model_id M1 --epochs 20 --save_interval 2 --train_opts_id ex3 --data_opts_id b_0_4

Here is part of the result:

2021-06-28 14:50:13.776150: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-06-28 14:50:13.776180: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
tensorflow version 2.3.3

______________________________ arguments ______________________________
train : True
datadir : /home/richel/.local/share/gcaer/gcae_v1_0/example_tiny/
data : HumanOrigins249_tiny
model_id : M1
...

However, when I work from another folder, say, one folder up ...

richel@N141CU:~/.local/share/gcaer$ /home/richel/.local/share/r-miniconda/envs/r-reticulate/bin/python \
  ~/.local/share/gcaer/gcae_v1_0/run_gcae.py train --datadir ~/.local/share/gcaer/gcae_v1_0/example_tiny/  \
  --data HumanOrigins249_tiny --model_id M1 --epochs 20 --save_interval 2 --train_opts_id ex3 --data_opts_id b_0_4

I get an error message that "data_opts/" + data_opts_id+".json" cannot be found, at here in the code:

2021-06-28 14:50:53.728916: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-06-28 14:50:53.728947: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
tensorflow version 2.3.3
Traceback (most recent call last):
  File "/home/richel/.local/share/gcaer/gcae_v1_0/run_gcae.py", line 396, in <module>
    with open("data_opts/" + data_opts_id+".json") as data_opts_def_file:
FileNotFoundError: [Errno 2] No such file or directory: 'data_opts/b_0_4.json'

The problem here is the hardcoded "data_opts/" part, that forces me to work in the same folder as GenoCAE. It feels clumsy to work with, as I have to change the working directory when calling GenoCAE. Note that, looking at the code, the same applies for train_opts and models.

I would enjoy a way to either (my favorites are first :-) ):

Would one of these options be doable?

kausmees commented 3 years ago

Hi

Thanks for the suggestion, I agree that it would be useful to be able to avoid having to invoke GCAE from the project directory.

Do you think it is important to be able to have the train_opts, data_opts and models in custom locations on the system, or would it work equally well to keep requiring them to be in the GCAE directory and re-writing the paths to be relative to project directory using e.g. (in run_gcae.py)

from pathlib import Path

GCAE_DIR = Path(__file__)

and assuming data_opts/ train_opts/ and models/ are in GCAE_DIR

richelbilderbeek commented 3 years ago

Yay, thanks for appreciating my suggestion!

I think your suggestion is great and I look forward to using it :-)