tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
77 stars 12 forks source link

SampComp error #199

Open rania-o opened 2 years ago

rania-o commented 2 years ago

Hello,

I'm using Nanocompore to compare between a modified sample and an IVT sample. I've already done the nanopolish collapse step and I got this in the log file (for the IVT sample, the modified one also has similar results):

2022-03-25T10:41:46.337153+0100 WARNING - MainProcess | Running Eventalign_collapse
2022-03-25T10:41:46.337736+0100 INFO - MainProcess | Checking and initialising Eventalign_collapse
2022-03-25T10:41:46.339649+0100 INFO - MainProcess | Starting data processing
2022-03-25T10:54:48.308272+0100 INFO - Process-6 | Output reads written:21561

Written Reads:21561 Kmers:6887018

and when I grep the valid kmers in the output collapsed file I get : 6078205 valid kmers / 6887018 kmers.

After this, I tried to run SampComp (even-though I don't have any replicats) :

nanocompore sampcomp --file_list1 psi0_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv  --file_list2 psi2_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv --fasta ../transcript_oligo.fasta  --outpath ./samp_comp_results

2022-03-25T14:32:16.222012+0100 WARNING - MainProcess | Running SampComp
2022-03-25T14:32:16.222857+0100 INFO - MainProcess | Checking and initialising SampComp
2022-03-25T14:32:16.226479+0100 INFO - MainProcess | Only 1 replicate found for condition Condition1
2022-03-25T14:32:16.226733+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:32:16.227296+0100 INFO - MainProcess | Only 1 replicate found for condition Condition2
2022-03-25T14:32:16.227704+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:32:16.230122+0100 INFO - MainProcess | Reading eventalign index files
2022-03-25T14:32:18.253073+0100 INFO - MainProcess |    References found in index: 1
2022-03-25T14:32:18.253414+0100 INFO - MainProcess | Filtering out references with low coverage
2022-03-25T14:32:18.254686+0100 INFO - MainProcess |    References remaining after reference coverage filtering: 0
2022-03-25T14:32:18.255010+0100 INFO - MainProcess | Starting data processing
2022-03-25T14:32:18.301037+0100 INFO - Process-3 | All Done. Transcripts processed: 0
2022-03-25T14:32:18.309365+0100 INFO - MainProcess | Loading SampCompDB
2022-03-25T14:32:18.317105+0100 INFO - MainProcess | The result database is empty
2022-03-25T14:32:18.318381+0100 INFO - MainProcess | Saving results

So I run it again with a min_coverage equal to 0 :

nanocompore sampcomp --file_list1 psi0_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv  --file_list2 psi2_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv --fasta ../transcript_oligo.fasta  --outpath ./samp_comp_results_2 --min_coverage 0

Condition:Condition1 Sample:Condition1_1    High fraction of invalid kmers: 21,555  valid reads: 6
Condition:Condition2 Sample:Condition2_1    High fraction of invalid kmers: 20,243  valid reads: 2

but there are almost 6 millions of valid kmers, isn't it enough ? or does it means that my data is not suitable for nanocompre ? (I used other tools to detect modifications, and it worked well)

This is the message error I got :

2022-03-25T14:59:18.933552+0100 WARNING - MainProcess | Running SampComp
2022-03-25T14:59:18.934119+0100 INFO - MainProcess | Checking and initialising SampComp
2022-03-25T14:59:18.937440+0100 INFO - MainProcess | Only 1 replicate found for condition Condition1
2022-03-25T14:59:18.937670+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:59:18.938098+0100 INFO - MainProcess | Only 1 replicate found for condition Condition2
2022-03-25T14:59:18.938339+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:59:18.940320+0100 INFO - MainProcess | Reading eventalign index files
2022-03-25T14:59:20.513673+0100 INFO - MainProcess |    References found in index: 1
2022-03-25T14:59:20.514114+0100 INFO - MainProcess | Filtering out references with low coverage
2022-03-25T14:59:20.515235+0100 INFO - MainProcess |    References remaining after reference coverage filtering: 1
2022-03-25T14:59:20.515533+0100 INFO - MainProcess | Starting data processing
2022-03-25T14:59:20.637782+0100 ERROR - Process-2 | Error doing GMM test on reference dystro-oligo
2022-03-25T14:59:20.638123+0100 ERROR - Process-2 | Error in Worker
nanocompore.common.NanocomporeError: Error doing GMM test on reference dystro-oligo
ValueError: Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I don't know if it's clear, waiting for your help. Thank you.

JannesSP commented 1 year ago

I get the same error, did you find a solution to that error? I guess you need at least two replicates per condition?

rania-o commented 1 year ago

No, I didn't. I just used other tools.

lmulroney commented 1 year ago

Hi rania-o and JannesSP, I apologise for the lack of activity here last year. How long is your reference sequence? If it is near 100 nt long then you may need to lower the reference length. You may also want to look at the --max_invalid_kmers_freq option and set it higher than 0.1 (the default).

I know you've likely moved on from using nanocompore, but if you try these settings and it works for you, let me know.

Thanks, Logan

keenhl commented 4 months ago

@rania-o What other tools have you tried ?

rania-o commented 3 months ago

@keenhl Drummer, Epinano, Eligos, Xpore ...