tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
77 stars 12 forks source link

Issues on reproducing the synthetic oligo results from the nanocompore paper #211

Closed loganylchen closed 1 year ago

loganylchen commented 1 year ago

Hi there,

I have been trying to reproduce the Fig2A in the NanoCompore paper, but I got some very weird results. I also tried xpore to analyze the same datasets. I've attached the results I obtained.

Could you help me figure out why the results were not reproduced?

Nanocompore Results

nanocompore_oligo_data_nanocompore

Xpore Results

nanocompore_oligo_data_xpore

Looking forward to your reply, Best,

Logan

lmulroney commented 1 year ago

Hi Logan,

Sorry for my delayed response. I have a few questions about your workflow to try and understand why the results are not the same as in figure 2A.

Which version of Nanopolish did you use for the signal to sequence alignment? What parameters did you use for Nanocompore eventalign collapse and sampcomp? Did you use the entire oligo dataset or a subset?

The reason these questions are important is that the oligos are 100 nt long, and that is shorter than a lot of the default min length limits in the software, and that might be removing a lot of reads from the analysis, effectively reducing the magnitude of the p-values. One way to check if a lot of data was skipped is to read the log file from the analysis and check on how many reads were retained and how many reads were skipped due to the filtering conditions.

Cheers, Logan

loganylchen commented 1 year ago

Thank Logan for your response.

I followed the documents on the github, using f5c did the signal alignment. Here are my commands.

# f5c version 1.1
f5c eventalign -r data/OligoControl/fastq/pass.fq.gz --print-read-names --signal-index --scale-events --min-mapq 0 --min-recalib-events 50 --samples --rna -b results/alignments/OligoControl_filtered.bam -g oligo.fa -t 32 --slow5 results/blow5/OligoControl.blow5

# nanocompore collapse
nanocompore eventalign_collapse -i results/eventalign/Oligo1_nanocompore.tsv.gz --outpath results/nanocompore_eventalign_collapse/Oligo1 --outprefix Oligo1 --overwrite --nthreads 12 2>logs/nanocompore_collapse/Oligo1.log

#nanocompore samplecomp
nanocompore sampcomp --file_list1 results/nanocompore_eventalign_collapse/OligoControl/OligoControl_eventalign_collapse.tsv --file_list2 results/nanocompore_eventalign_collapse/Oligo1/Oligo1_eventalign_collapse.tsv --label1 Control --label2 Native --logit --min_ref_length 50 --overwrite --min_coverage 5  --fasta oligo.fa --outpath results/nanocompore/Oligo1_OligoControl --outprefix Oligo1_OligoControl --overwrite  2>logs/nanocompore/Oligo1_OligoControl.log

I used the entire oligo dataset.

May I ask if you could provide the scripts that to reproduce the results in your paper? Then I can check the discrepancy.

Best, Logan

lmulroney commented 1 year ago

Hi Logan,

Ya, the issue is that both nanopolish and f5c require at least 200 events (essentially bases) in the signal file, and will filter anything shorter than that. And so you're losing a lot of the data at that filtering step. I don't know where to patch f5c to account for smaller reads, but you can make the following patch to nanopolish.

Change line 241 of nanopolish_methyltrain.cpp.

const int minNumEventsToRescale = 200;

to

const int minNumEventsToRescale = 10;

@@ -238,7 +238,7 @@ } }

I have a singularity image of the patched nanopolish, but it's too large to effectively share here. I think it would be faster for you to make this edit to nanopolish and try working with the oligo data again from the nanopolish step.

Let me know if this makes sense and works in your hands.

All the best, Logan