vanheeringen-lab / gimmemotifs

Suite of motif tools, including a motif prediction pipeline for ChIP-seq experiments. See full GimmeMotifs documentation for detailed installation instructions and usage examples.
https://gimmemotifs.readthedocs.io/en/master
MIT License
108 stars 32 forks source link

Exception on differential enrichment #312

Open Mitmischer opened 4 months ago

Mitmischer commented 4 months ago

Describe the bug In order to perform differential enrichment, I wanted to try on a single gene, but gimme crashes.

To Reproduce Steps to reproduce the behavior:

gimme maelstrom -N120 IFIH1.fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa /tmp/maelstrom_out

IFIH1.fasta is attached: IFIH1.fasta.txt

Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa can be downloaded from here.

Expected behavior I expected the program to run in full or to provide a concise error message.

Error logs

2024-03-17 11:07:42,884 - INFO - Starting maelstrom                                                                                                                                                                        2024-03-17 11:07:42,890 - INFO - motif scanning (counts)                                                                                          
2024-03-17 11:07:42,890 - INFO - reading table                                                                                                                                 
2024-03-17 11:07:45,717 - INFO - using 14000 sequences                                                                                            
2024-03-17 11:08:34,427 - INFO - setting threshold                                                                                                
Determining FPR-based threshold: 100%|██████████████████████████████████████████████████████████████| 10633/10633 [12:40<00:00, 13.98 sequences/s]
2024-03-17 11:21:23,647 - INFO - creating count table                                                                                             
Scanning: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:09<00:00,  9.37s/ sequences]
2024-03-17 11:21:33,022 - INFO - done                                                                                                             
2024-03-17 11:21:33,022 - INFO - creating dataframe                                                                                               
2024-03-17 11:21:33,435 - INFO - motif scanning (scores)                                                                                          
2024-03-17 11:21:33,435 - INFO - reading table                                                                                                    
2024-03-17 11:21:39,620 - INFO - using 14000 sequences                                                                                            
2024-03-17 11:22:13,126 - INFO - creating score table (z-score, GC%)                                                                              
Determining mean and stddev for motifs: 100%|██████████████████████████████████████████████████████████| 19756/19756 [11:18<00:00, 29.13 motifs/s]             
Scanning: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.55s/ sequences]
2024-03-17 11:33:43,722 - INFO - done                                                                                                                                          
2024-03-17 11:33:43,722 - INFO - creating dataframe                                                                                               
2024-03-17 11:33:44,235 - INFO - Selecting non-redundant motifs                                                                                   
Traceback (most recent call last):                                                                                                                
  File "/home/mabe/.conda/envs/mabe/bin/gimme", line 12, in <module>                                                                                                           
    cli(sys.argv[1:])                                                                                                                                          
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/gimmemotifs/cli.py", line 755, in cli                                                                         
    args.func(args)                                                                                                                                                            
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/gimmemotifs/commands/maelstrom.py", line 42, in maelstrom                                                                                                 
    run_maelstrom(                                                                                                                                                                                 
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/gimmemotifs/maelstrom/__init__.py", line 239, in run_maelstrom                                                
    fa.fit(scores)                                                                                                                                                             
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/sklearn/base.py", line 1474, in wrapper                                                                                                                   
    return fit_method(estimator, *args, **kwargs)                                                                                                                              
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/sklearn/cluster/_agglomerative.py", line 1329, in fit                                                                             
    super()._fit(X.T)                                                                                                                                                          
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/sklearn/cluster/_agglomerative.py", line 1066, in _fit                                                                            
    out = memory.cache(tree_builder)(                                                            
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/joblib/memory.py", line 353, in __call__                                                                                          
    return self.func(*args, **kwargs)                                                                                                                                                                                      
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/sklearn/cluster/_agglomerative.py", line 706, in _complete_linkage                                                                                        
    return linkage_tree(*args, **kwargs)                                                                                                                                                                                   
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/sklearn/cluster/_agglomerative.py", line 585, in linkage_tree                                                                                             
    out = hierarchy.linkage(X, method=linkage, metric=affinity)                                                                                                                                                            
  File "/home/mabe/.conda/envs/mabe/lib/python3.10/site-packages/scipy/cluster/hierarchy.py", line 1030, in linkage                                                                                                            raise ValueError("The condensed distance matrix must contain only "                                                                                                                                                    
ValueError: The condensed distance matrix must contain only finite values.

Installation information (please complete the following information):

Additional context As I am new to the software, this might as well be an error on my side (or maybe the statistics just doesn't work out on a single gene) . Still, I think that the error handling/error message should be better!

maxfieldk commented 2 months ago

I am having the same issue. You haven't managed to find a way around this @Mitmischer have you? Thanks!