vanheeringen-lab / ANANSE

Prediction of key transcription factors in cell fate determination using enhancer networks. See full ANANSE documentation for detailed installation instructions and usage examples.
http://anansepy.readthedocs.io
MIT License
77 stars 16 forks source link

Ananse binding errors - TF #99

Closed cdsoria closed 3 years ago

cdsoria commented 3 years ago

Hello, I tried running the ananse binding with your example data. However, when I try it. It gives me an error right at the end as shown here. It does not give exactly why it fails. A colleague mentioned that this happened to him but with a different TF. Thank you so much. NB: I am using r5d.24xlarge EC2 instance, with 120GiB storage.

command

ananse binding -H ANANSE_example_data/H3K27ac/heart_H3K27ac_rep1bam -A ANANSE_example_data/ATAC/heart_ATAC_repbam -R ANANSE.REMAP.model.v1.0/ -g genomes/hg38/ -o heart.binding 2021-06-23 13:46:03 | DEBUG | Using default motif file 2021-06-23 13:46:07 | DEBUG | using motifs for 1884 factors 2021-06-23 13:46:14 | INFO | loading motifs for reference 2021-06-23 13:46:42 | INFO | loading average peak coverage for reference 2021-06-23 13:46:43 | INFO | loading distance for reference 2021-06-23 13:46:45 | INFO | loading ATAC data

2021-06-23 14:22:32 | DEBUG | quantile normalization for ATAC 2021-06-23 14:22:36 | INFO | loading H3K27ac data

2021-06-23 14:28:00 | DEBUG | quantile normalization for H3K27ac 2021-06-23 14:28:03 | INFO | Loading models 2021-06-23 14:28:04 | INFO | 238 models found 2021-06-23 14:28:04 | INFO | Predicting TF activity 2021-06-23 14:28:39,186 - INFO - motif scanning (scores) 2021-06-23 14:28:39,187 - INFO - reading table 2021-06-23 14:28:45,827 - INFO - creating score table (z-score, GC%) 2021-06-23 14:31:37,252 - INFO - done 2021-06-23 14:31:39,140 - INFO - creating dataframe 2021-06-23 14:31:48,930 - INFO - Fitting BayesianRidge 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.62s/it] 2021-06-23 14:31:57,716 - INFO - Done ATAC GM.5.0.Sox.0001 0.005135 GM.5.0.Homeodomain.0001 -0.005660 GM.5.0.Mixed.0001 0.005816 GM.5.0.Nuclear_receptor.0001 -0.002594 GM.5.0.Mixed.0002 0.005685 ... ... GM.5.0.C2H2_ZF.0316 0.001762 GM.5.0.C2H2_ZF.0317 -0.005541 GM.5.0.Ets.0049 -0.006159 GM.5.0.Unknown.0208 -0.013564 GM.5.0.Homeodomain_POU.0026 0.001069

[1796 rows x 1 columns] 2021-06-23 14:32:01,691 - INFO - motif scanning (scores) 2021-06-23 14:32:01,692 - INFO - reading table 2021-06-23 14:32:08,400 - INFO - creating score table (z-score, GC%) 2021-06-23 14:35:00,230 - INFO - done 2021-06-23 14:35:02,031 - INFO - creating dataframe 2021-06-23 14:35:11,945 - INFO - Fitting BayesianRidge 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.44s/it] 2021-06-23 14:35:20,684 - INFO - Done ATAC ATAC.relative GM.5.0.Sox.0001 0.005135 0.001231 GM.5.0.Homeodomain.0001 -0.005660 -0.001224 GM.5.0.Mixed.0001 0.005816 0.003369 GM.5.0.Nuclear_receptor.0001 -0.002594 -0.000156 GM.5.0.Mixed.0002 0.005685 0.001466 ... ... ... GM.5.0.C2H2_ZF.0316 0.001762 -0.005445 GM.5.0.C2H2_ZF.0317 -0.005541 0.003592 GM.5.0.Ets.0049 -0.006159 0.001622 GM.5.0.Unknown.0208 -0.013564 -0.001784 GM.5.0.Homeodomain_POU.0026 0.001069 -0.003657

[1796 rows x 2 columns] 2021-06-23 14:35:24,634 - INFO - motif scanning (scores) 2021-06-23 14:35:24,635 - INFO - reading table 2021-06-23 14:35:31,145 - INFO - creating score table (z-score, GC%) 2021-06-23 14:38:16,347 - INFO - done 2021-06-23 14:38:18,124 - INFO - creating dataframe 2021-06-23 14:38:28,074 - INFO - Fitting BayesianRidge 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.49s/it] 2021-06-23 14:38:36,889 - INFO - Done ATAC ATAC.relative H3K27ac GM.5.0.Sox.0001 0.005135 0.001231 0.001394 GM.5.0.Homeodomain.0001 -0.005660 -0.001224 -0.006519 GM.5.0.Mixed.0001 0.005816 0.003369 0.005057 GM.5.0.Nuclear_receptor.0001 -0.002594 -0.000156 -0.004400 GM.5.0.Mixed.0002 0.005685 0.001466 0.002080 ... ... ... ... GM.5.0.C2H2_ZF.0316 0.001762 -0.005445 -0.000176 GM.5.0.C2H2_ZF.0317 -0.005541 0.003592 -0.002873 GM.5.0.Ets.0049 -0.006159 0.001622 -0.004716 GM.5.0.Unknown.0208 -0.013564 -0.001784 -0.008203 GM.5.0.Homeodomain_POU.0026 0.001069 -0.003657 -0.007543

[1796 rows x 3 columns] 2021-06-23 14:38:40,711 - INFO - motif scanning (scores) 2021-06-23 14:38:40,711 - INFO - reading table 2021-06-23 14:38:47,020 - INFO - creating score table (z-score, GC%) 2021-06-23 14:41:38,770 - INFO - done 2021-06-23 14:41:40,511 - INFO - creating dataframe 2021-06-23 14:41:50,412 - INFO - Fitting BayesianRidge 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.26s/it] 2021-06-23 14:41:58,950 - INFO - Done ATAC ATAC.relative H3K27ac H3K27ac.relative GM.5.0.Sox.0001 0.005135 0.001231 0.001394 0.001257 GM.5.0.Homeodomain.0001 -0.005660 -0.001224 -0.006519 -0.003463 GM.5.0.Mixed.0001 0.005816 0.003369 0.005057 0.003218 GM.5.0.Nuclear_receptor.0001 -0.002594 -0.000156 -0.004400 -0.003943 GM.5.0.Mixed.0002 0.005685 0.001466 0.002080 -0.000557 ... ... ... ... ... GM.5.0.C2H2_ZF.0316 0.001762 -0.005445 -0.000176 0.004563 GM.5.0.C2H2_ZF.0317 -0.005541 0.003592 -0.002873 -0.003964 GM.5.0.Ets.0049 -0.006159 0.001622 -0.004716 -0.005366 GM.5.0.Unknown.0208 -0.013564 -0.001784 -0.008203 -0.007877 GM.5.0.Homeodomain_POU.0026 0.001069 -0.003657 -0.007543 -0.003038

[1796 rows x 4 columns] 2021-06-23 14:41:59 | INFO | Using SRY motif with SOX2 model weights 2021-06-23 14:42:04 | INFO | Using SOX9 motif with SOX13 model weights 2021-06-23 14:42:08 | INFO | Using SOX13 model 2021-06-23 14:42:12 | INFO | Using SOX15 motif with SOX2 model weights 2021-06-23 14:42:17 | INFO | Using Sox9 motif with SOX13 model weights Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/ananse/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Sox9'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/ananse/bin/ananse", line 318, in args.func(args) File "/home/ubuntu/miniconda3/envs/ananse/lib/python3.9/site-packages/ananse/commands/binding.py", line 11, in binding predict_peaks( File "/home/ubuntu/miniconda3/envs/ananse/lib/python3.9/site-packages/ananse/peakpredictor.py", line 709, in predict_peaks proba = p.predict_proba(factor) File "/home/ubuntu/miniconda3/envs/ananse/lib/python3.9/site-packages/ananse/peakpredictor.py", line 410, in predict_proba X = self._load_data(factor) File "/home/ubuntu/miniconda3/envs/ananse/lib/python3.9/site-packages/ananse/peakpredictor.py", line 420, in _load_data {factor: self._motifs[factor]}, index=self.regions File "/home/ubuntu/miniconda3/envs/ananse/lib/python3.9/site-packages/pandas/core/frame.py", line 3024, in getitem indexer = self.columns.get_loc(key) File "/home/ubuntu/miniconda3/envs/ananse/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc raise KeyError(key) from err KeyError: 'Sox9'

simonvh commented 3 years ago

Thanks for reporting @cdsoria. I think I have localized the problem. Will test and get back to you.

simonvh commented 3 years ago

Hi @cdsoria, the fix seems to work. Until the new version is released you can try it out by installing the fix manually. Run the following command in your ananse conda environment, and see if that indeed fixes your issue:

pip install git+https://github.com/vanheeringen-lab/ANANSE.git@9de0982
cdsoria commented 3 years ago

Thank you for looking at this so quickly. I am testing it now and will let you know. Thanks

cdsoria commented 3 years ago

@simonvh That seemed to have done the trick. No errors were thrown. The last TF in the list was: 2021-06-23 20:03:16 | INFO | Using ENO1 motif with MYC model weights. ENO1 is the last factor in "factor_activity.tsv" so I am assuming it ran correctly. Again thank you for looking into this.