tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
80 stars 12 forks source link

[ERROR - MainProcess | High fraction of invalid kmers] #175

Closed wososa closed 3 years ago

wososa commented 3 years ago

Describe the bug After data preparation step (https://nanocompore.rna.rocks/data_preparation/), I ran sampcomp. However, I found a lot of ERRORs.

To Reproduce nanocompore sampcomp \ --file_list1 testrun1_eventalign_collapsed_reads.tsv/out_eventalign_collapse.tsv \ --file_list2 testrun2_eventalign_collapsed_reads.tsv/out_eventalign_collapse.tsv \ --label1 testrun1 \ --label2 testrun2 \ --fasta mouse_transcriptome.fa \ --outpath testrun1_v_testrun2

Expected behavior Shouldn't have errors.

Screenshots WARNING - MainProcess | Running SampComp INFO - MainProcess | Checking and initialising SampComp DEBUG - MainProcess | package_name: nanocompore DEBUG - MainProcess | package_version: 1.0.2 DEBUG - MainProcess | timestamp: 2020-12-30 10:45:54.156630 DEBUG - MainProcess | progress: False DEBUG - MainProcess | nthreads: 22 DEBUG - MainProcess | exclude_ref_id: [] DEBUG - MainProcess | select_ref_id: [] DEBUG - MainProcess | max_invalid_kmers_freq: 0.1 DEBUG - MainProcess | downsample_high_coverage: 5000 DEBUG - MainProcess | min_ref_length: 100 DEBUG - MainProcess | min_coverage: 30 DEBUG - MainProcess | sequence_context_weights: uniform DEBUG - MainProcess | sequence_context: 0 DEBUG - MainProcess | allow_warnings: False DEBUG - MainProcess | anova: False DEBUG - MainProcess | logit: True DEBUG - MainProcess | comparison_methods: GMM,KS DEBUG - MainProcess | overwrite: False DEBUG - MainProcess | outprefix: out DEBUG - MainProcess | outpath: testrun1_v_testrun2 DEBUG - MainProcess | fasta_fn: mouse_transcriptome.fa INFO - MainProcess | Only 1 replicate found for condition testrun1 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method INFO - MainProcess | Only 1 replicate found for condition testrun2 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method DEBUG - MainProcess | OrderedDict([('testrun1', {'testrun1_1': 'testrun1_eventalign_collapsed_reads.tsv/out_eventalign_colla INFO - MainProcess | Reading eventalign index files ERROR - MainProcess | High fraction of invalid kmers (122.42%) for read aee62692-c826-40f7-81e3-264b9bcc8e74 ERROR - MainProcess | High fraction of invalid kmers (104.15%) for read 1a1aecb7-d796-4d19-ae69-439aa17c723e ERROR - MainProcess | High fraction of invalid kmers (70.94%) for read 270cc13e-9696-4c8f-aa87-89cdf9945eaa ERROR - MainProcess | High fraction of invalid kmers (21.26%) for read f7957b1b-dae1-4a28-a7e9-30be2bd657f9 ERROR - MainProcess | High fraction of invalid kmers (112.8%) for read 9d2744a5-fd9a-4e35-bfd9-0725cf56989a ERROR - MainProcess | High fraction of invalid kmers (11.13%) for read d7a3d57c-3097-4cc1-81af-7d48a65c250e ERROR - MainProcess | High fraction of invalid kmers (10.16%) for read 4381a43f-c44a-4f70-a5e7-baa121d461da ERROR - MainProcess | High fraction of invalid kmers (10.33%) for read e1c346f6-f52d-4163-96ec-e7964f135c17 ERROR - MainProcess | High fraction of invalid kmers (12.69%) for read cee22501-abec-4279-a25a-0007488353e0

tleonardi commented 3 years ago

Hi, these errors aren't a real problem, they just mean that certain reads get discarded due to high number of invalid kmers. Does Nanocompore complete?

tleonardi commented 3 years ago

Closed because inactive

wososa commented 3 years ago

Dear Nanocompore developer,

Nanocompore has been running for a very long time. The last step has been 5 days. should I wait for it to finish?

2021-01-25T23:58:51.227901-0500 ERROR - MainProcess | High fraction of invalid kmers (38.95%) for read 8cb14954-dc85-44fe-aa0f-e7cd65055099 2021-01-25T23:58:51.228206-0500 ERROR - MainProcess | High fraction of invalid kmers (23.8%) for read 1e236488-381a-4c29-ba3e-513231becc18 2021-01-25T23:58:54.423011-0500 INFO - MainProcess | References found in index: 61621 2021-01-25T23:58:55.055914-0500 INFO - MainProcess | Filtering out references with low coverage 2021-01-25T23:58:58.913828-0500 INFO - MainProcess | References remaining after reference coverage filtering: 23286 2021-01-25T23:59:04.278296-0500 INFO - MainProcess | Starting data processing 2021-02-06T04:33:46.735469-0500 INFO - Process-3 | All Done. Transcripts processed: 23286 2021-02-06T04:33:48.124521-0500 INFO - MainProcess | Loading SampCompDB 2021-02-06T04:33:49.913571-0500 INFO - MainProcess | Calculate results

Thanks, Woody