tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
80 stars 12 forks source link

Not enough p-values #68

Closed tleonardi closed 5 years ago

tleonardi commented 5 years ago

Describe the bug This error is raised also when using --min_ref_length 60. Related to #63.

nanocompore.common.NanocomporeError: Not enough p-values for a context
of order 2

Traceback

$ nanocompore -v
v1.0.0rc1

...

Starting data processing
  16%|#########2                                               | 371/2283
[24:12<2:40:01,  5.02s/ Processed References]Traceback (most recent call
last):
   File "/usr/local/bin/nanocompore", line 9, in <module>
     load_entry_point('nanocompore==1.0.0rc1', 'console_scripts',
'nanocompore')()
   File
"/usr/local/lib/python3.5/dist-packages/nanocompore/nanocompore_main.py",
line 142, in main
     args.func(args)
   File
"/usr/local/lib/python3.5/dist-packages/nanocompore/nanocompore_main.py",
line 177, in sampcomp_main
     db = s()
   File "/usr/local/lib/python3.5/dist-packages/nanocompore/SampComp.py",
line 253, in __call__
     raise E
   File "/usr/local/lib/python3.5/dist-packages/nanocompore/SampComp.py",
line 245, in __call__
     raise NanocomporeError(tb)
nanocompore.common.NanocomporeError: Traceback (most recent call last):
   File "/usr/local/lib/python3.5/dist-packages/nanocompore/SampComp.py",
line 368, in __process_references
     random_state=random_state)
   File "/usr/local/lib/python3.5/dist-packages/nanocompore/TxComp.py",
line 112, in txCompare
     corr_matrix_dict[test] = cross_corr_matrix(pval_list_dict[test],
sequence_context)
   File "/usr/local/lib/python3.5/dist-packages/nanocompore/TxComp.py",
line 319, in cross_corr_matrix
     raise NanocomporeError("Not enough p-values for a context of order
%s"%context)
nanocompore.common.NanocomporeError: Not enough p-values for a context
of order 2
tleonardi commented 5 years ago

Ok, after looking into the issue it looks like the problem is caused by the lowCoverage check of TxComp. Essentially, it's possible that for a transcript just above --min_coverage, only a few positions pass this threshold. If the number of passing positions is less than context*3+3 we get the Not enough p-values for a context of order 2 error.

Additionally, I've noticed that there's another error:

# Collect pvalue lists per tests
pval_list_dict = defaultdict(list)
for pos_dict in ref_pos_list:
    if 'txComp' in pos_dict:
        for test in tests:
           if test in pos_dict['txComp']:
               pval_list_dict[test].append(pos_dict['txComp'][test])

This block completely ignores the lowCov positions, so if we have something like:

0.01
lowCov
0.01
lowCov

pval_list_dict becomes: [0.01, 0.01], leading to a wrong cross correlation.

The solution is to set the p-values of the lowCov positions to 1 when filling the pval_list_dict list.

a-slide commented 5 years ago

Sounds like a reasonable approach for the lowCov.