tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
78 stars 12 forks source link

nanocompore.common.NanocomporeError: The result database is empty #121

Closed christiansstevens closed 4 years ago

christiansstevens commented 4 years ago

Describe the Bug When trying to run nanocompore sampcomp I get the following error: nanocompore.common.NanocomporeError: The result database is empty

Initialising SampComp and checking options
Initialising Whitelist and checking options
Reading eventalign index files
    References found in index: 1
Filtering out references with low coverage
    References remaining after reference coverage filtering: 0
Starting data processing
0 Processed References [00:00, ? Processed References/s]
Loading SampCompDB
Traceback (most recent call last):
  File "/anaconda3/envs/nanocompore/bin/nanocompore", line 8, in <module>
    sys.exit(main())
  File "/anaconda3/envs/nanocompore/lib/python3.6/site-packages/nanocompore/__main__.py", line 139, in main
    args.func(args)
  File "/anaconda3/envs/nanocompore/lib/python3.6/site-packages/nanocompore/__main__.py", line 174, in sampcomp_main
    db = s()
  File "/anaconda3/envs/nanocompore/lib/python3.6/site-packages/nanocompore/SampComp.py", line 268, in __call__
    log_level=self.__log_level)
  File "/anaconda3/envs/nanocompore/lib/python3.6/site-packages/nanocompore/SampCompDB.py", line 82, in __init__
    raise NanocomporeError("The result database is empty")
nanocompore.common.NanocomporeError: The result database is empty

To Reproduce

nanocompore sampcomp \ --file_list1 control_collapsed_rep1.tsv,control_collapsed_rep2.tsv \ --file_list2 treated_collapsed_rep1.tsv,treated_collapsed_rep2.tsv \ --label1 control \ --label2 treated \ --fasta ../fastas/ref.fasta \ --outpath ./sampcomp/

To note, the data being used here is somewhat atypical. Instead using Nanopore to direct RNA sequence messages, we are actually direct sequencing viral genomes of an RNA virus. We use genomic primers for this process and from this our reads are specific to the viral genome and not messages. What I think may be contributing is the length of the viral genome being 11,901 bp. We have coverage across the entire genome, but in each file our coverage minimums at any individual position are 2, 15, 12, and 14. However, average depth is not bad at 422, 2222, 2067, and 3608 reads per base. So effectively we are trying to run Nanocompore on a single "transcript" that is the viral genome.

We have attempted adding in the --min_coverage flag, but that results in the same error until we drop min_coverage to 1 when we get a new error:

Initialising SampComp and checking options
Initialising Whitelist and checking options
Reading eventalign index files
    References found in index: 1
Filtering out references with low coverage
    References remaining after reference coverage filtering: 1
Starting data processing
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.74 Processed References/s]
Loading SampCompDB
Traceback (most recent call last):
  File "/anaconda3/envs/nanocompore/bin/nanocompore", line 8, in <module>
    sys.exit(main())
  File "/anaconda3/envs/nanocompore/lib/python3.6/site-packages/nanocompore/__main__.py", line 139, in main
    args.func(args)
  File "/anaconda3/envs/nanocompore/lib/python3.6/site-packages/nanocompore/__main__.py", line 176, in sampcomp_main
    db.save_all(pvalue_thr=args.pvalue_thr)
  File "/anaconda3/envs/nanocompore/lib/python3.6/site-packages/nanocompore/SampCompDB.py", line 258, in save_all
    self.save_report(output_fn = outpath_prefix+"nanocompore_results.tsv")
  File "/anaconda3/envs/nanocompore/lib/python3.6/site-packages/nanocompore/SampCompDB.py", line 381, in save_report
    for record in self.results[self.results.ref_id == cur_id ].itertuples():
AttributeError: 'SampCompDB' object has no attribute 'results'

I did see the previous issue posted here: https://github.com/tleonardi/nanocompore/issues/118 but it isn't clear to me that it is applicable as this was an issue with downsampling (although our problem may be analogous.

Desktop

Would love to know if there's a potential fix here and hoping the problem is a small one on our end or if there is generally an issue with using Nanocompore for this sort of situation. Can send the eventalign_collapse files if helpful, but they are relatively large.

Thanks!

tleonardi commented 4 years ago

Hi @christiansstevens, thanks for the detailed report! I understand you approach, we have used a similar targeting strategy for the 7SK non-coding RNA in the Nanocompore paper. The strategy per se shouldn't pose any particular issue... I think the problem here might be the low coverage (that explains why you need to lower --min_coverage), that leads to 0 significant sites and hence the 'SampCompDB' object has no attribute 'results' (this error message could be more informative, I'll open a separate issue to fix it).

In any case, I'd be happy to have a look at the eventalign files to try to get something out of the dataset. Any means of transfer works for me (Dropbox, FTP, etc.), just drop me an email at tommaso.leonardi@iit.it.

cheers tom