kipoi / models

Model zoo for genomics
http://kipoi.org
MIT License
164 stars 58 forks source link

fix missing reverse-complement in reference sequence fetching for APARENT #336

Closed Hoeze closed 2 years ago

Hoeze commented 2 years ago

Ensure that the ref-sequence is reverse-complemented

Hoeze commented 2 years ago

@haimasree should we just merge this? I guess it will change the testing predictions as well.

haimasree commented 2 years ago

By all means! I did not intervene since Alex was assigned. The tests are passing so completely fine by me. I just merged the master branch. Lets see if the tests are still passing.

haimasree commented 2 years ago

Tests are passing it seems. Do you think this is wrong?

Hoeze commented 2 years ago

Hm, this is super strange: Depending on whether I run the tests in pycharm or in the terminal I get different results.

Running the test in pycharm fails:

![image](https://user-images.githubusercontent.com/1200058/175542061-783ee127-d2ca-4e6e-8432-0b20ea7892ea.png) ```python /opt/anaconda/envs/kipoi-env/bin/python -m kipoi test . --batch_size=10 --source=dir INFO [kipoi.data] Using user specified dataloader from LocalSource(local_path='/home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff') INFO [kipoi.data] successfully loaded the dataloader ././ from /home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff/dataloader.py::Kipoi_APARENT_DL INFO [kipoi.model] Downloading model arguments weights from https://github.com/johli/aparent/raw/8a884f0bc4073ed0edd588f71b61a5be4a37e831/saved_models/aparent_large_lessdropout_all_libs_no_sampleweights.h5 Using downloaded and verified file: /home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff/downloaded/model_files/weights/31902fb40125679e655b8b6d2747ada7 2022-06-24 14:58:13.115952: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. INFO [kipoi.pipeline] dataloader.output_schema is compatible with model.schema INFO [kipoi.pipeline] Initialized data generator. Running batches... INFO [kipoi.specs] Example file for argument fasta_file already exists INFO [kipoi.specs] Example file for argument gtf_file already exists INFO [kipoi.specs] Example file for argument vcf_file already exists INFO [kipoi.specs] Example file for argument vcf_file_tbi already exists 0it [00:00, ?it/s]INFO [kipoi.pipeline] Returned data schema correct 42it [00:01, 35.43it/s] 0%| | 0/41 [00:00

Running it in the terminal works:

``` (kipoi-env) █▓▒░hoelzlwimmerf@desktop01░▒▓██▓▒░ Fr Jun 24 02:57:50 /home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff> /opt/anaconda/envs/kipoi-env/bin/python -m kipoi test . --batch_size=10 --source=dir INFO [kipoi.data] Using user specified dataloader from LocalSource(local_path='/home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff') INFO [kipoi.data] successfully loaded the dataloader ././ from /home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff/dataloader.py::Kipoi_APARENT_DL INFO [kipoi.model] Downloading model arguments weights from https://github.com/johli/aparent/raw/8a884f0bc4073ed0edd588f71b61a5be4a37e831/saved_models/aparent_large_lessdropout_all_libs_no_sampleweights.h5 Using downloaded and verified file: /home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff/downloaded/model_files/weights/31902fb40125679e655b8b6d2747ada7 2022-06-24 14:58:24.009335: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. INFO [kipoi.pipeline] dataloader.output_schema is compatible with model.schema INFO [kipoi.pipeline] Initialized data generator. Running batches... INFO [kipoi.specs] Example file for argument fasta_file already exists INFO [kipoi.specs] Example file for argument gtf_file already exists INFO [kipoi.specs] Example file for argument vcf_file already exists INFO [kipoi.specs] Example file for argument vcf_file_tbi already exists INFO [kipoi.pipeline] Returned data schema correct 42it [00:01, 35.64it/s] INFO [kipoi.pipeline] predict_example done! Using downloaded and verified file: /home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff/downloaded/model_files/test.expect.h5 INFO [kipoi.cli.main] Testing if the predictions match the expected ones in the file: /home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff/downloaded/model_files/test.expect.h5 INFO [kipoi.cli.main] Desired precision (number of matching decimal places): 4 0%| | 0/41 [00:00

How is that possible?


@haimasree can we somehow obtain the testing predictions? Adding -o /tmp/APARENT.veff.predictions.hdf5 skips the prediction comparison:

```python /opt/anaconda/envs/kipoi-env/bin/python -m kipoi test . --batch_size=10 --source=dir -o /tmp/APARENT.veff.predictions.hdf5 INFO [kipoi.data] Using user specified dataloader from LocalSource(local_path='/home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff') INFO [kipoi.data] successfully loaded the dataloader ././ from /home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff/dataloader.py::Kipoi_APARENT_DL INFO [kipoi.model] Downloading model arguments weights from https://github.com/johli/aparent/raw/8a884f0bc4073ed0edd588f71b61a5be4a37e831/saved_models/aparent_large_lessdropout_all_libs_no_sampleweights.h5 Using downloaded and verified file: /home/hoelzlwimmerf/Projects/kipoi/kipoi-models/APARENT/veff/downloaded/model_files/weights/31902fb40125679e655b8b6d2747ada7 2022-06-24 15:08:11.312341: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. INFO [kipoi.pipeline] dataloader.output_schema is compatible with model.schema INFO [kipoi.pipeline] Initialized data generator. Running batches... INFO [kipoi.specs] Example file for argument fasta_file already exists INFO [kipoi.specs] Example file for argument gtf_file already exists INFO [kipoi.specs] Example file for argument vcf_file already exists INFO [kipoi.specs] Example file for argument vcf_file_tbi already exists INFO [kipoi.pipeline] Returned data schema correct 42it [00:01, 34.07it/s] INFO [kipoi.pipeline] predict_example done! INFO [kipoi.cli.main] Successfully ran test_predict ```
haimasree commented 2 years ago

Okay this is indeed strange. To answer @haimasree can we somehow obtain the testing predictions? Do you mean how you can test /tmp/APARENT.veff.predictions.hdf5 with kipoi test cli? If yes simply do the following Modify the test snippet to this

test: 
      expect: /tmp/APARENT.veff.predictions.hdf5
      precision_decimal: 4 (or whatever)

and kipoi test <model-name> --source=dir

haimasree commented 2 years ago

@Hoeze in your pycharm version - precision_decimal is 7 which is the default value. So, somehow precision_decimal:4 is not getting honored but in your terminal version it is.

So:

Terminal: INFO [kipoi.cli.main] Desired precision (number of matching decimal places): 4 Pycharm: Arrays are not almost equal to 7 decimals

Now I dont why its like that. Any thoughts?

haimasree commented 2 years ago

@Hoeze Any update on this? Shall I just merge?

Hoeze commented 2 years ago

I'll merge now, but this definitely needs more debugging. I dont get why the predictions seem to be the same.

Still, I'm limited on time and I dont know when I can come back to this issue...

haimasree commented 2 years ago

Same upto 4 decimal places but not 7 ;)