tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
77 stars 12 forks source link

Problems with save_to_bed. And questions with bed file annotation #220

Closed XqKang closed 1 year ago

XqKang commented 1 year ago

Hi I am trying to save my result. I don't have a problem using save_report. But when i try to save it in bed, i have this error

2023-05-29 08:46:52.391 | INFO | nanocompore.SampCompDB:init:55 - Loading SampCompDB 2023-05-29 08:46:52.471 | DEBUG | nanocompore.SampCompDB:init:62 - Reading Metadata 2023-05-29 08:46:52.502 | DEBUG | nanocompore.SampCompDB:init:68 - Loading list of reference ids 2023-05-29 08:46:52.504 | DEBUG | nanocompore.SampCompDB:init:82 - Checking files and arg values 2023-05-29 08:46:52.580 | INFO | nanocompore.SampCompDB:init:100 - Calculate results 2023-05-29 08:46:54.544 | ERROR | nanocompore.common:init:22 - The field 'None' is not in the results [SampCompDB] package_name: 1.0.4 package_version: nanocompore timestamp: 2023-05-28 16:09:49.137394 comparison_methods: ['GMM', 'KS'] pvalue_tests: ['GMM_logit_pvalue', 'KS_dwell_pvalue', 'KS_intensity_pvalue'] sequence_context: 0 min_coverage: 30 n_samples: 2 Number of references: 1

Traceback (most recent call last): File "SampCompDB.py", line 10, in db.save_to_bed (output_fn="/work/rouhanifardlab/Xinqi/Data_guppy_6.3.2/Nanocompore/resultsc/test_result.bed") File "/home/kang.xin/.conda/envs/nanocompore/lib/python3.7/site-packages/nanocompore/SampCompDB.py", line 295, in save_to_bed raise NanocomporeError(("The field '%s' is not in the results" % pvalue_field)) nanocompore.common.NanocomporeError: The field 'None' is not in the results

The code that i am running is from nanocompore.SampCompDB import SampCompDB, jhelp import matplotlib.pyplot as pl import numpy as np db = SampCompDB ( db_fn = "/work/rouhanifardlab/Xinqi/Data_guppy_6.3.2/Nanocompore/resultsc/outSampComp.db", fasta_fn = "/work/rouhanifardlab/Xinqi/Data_guppy_6.3.2/Reference/gencode.v43.transcripts.fa", bed_fn="/work/rouhanifardlab/Xinqi/Data_guppy_6.3.2/Nanocompore/test.bed") print (db)

db.save_to_bed (output_fn="/work/rouhanifardlab/Xinqi/Data_guppy_6.3.2/Nanocompore/resultsc/test_result.bed")

fig,ax=db.plot_pvalue("ENST00000373237.4|ENSG00000126067.12|OTTHUMG00000004169.3|OTTHUMT00000012016.2|PSMB2-201|PSMB2|4426|protein_coding|") pl.xlim([0,900]) pl.xticks(np.arange(0, 900, 25)) pl.savefig('fig1.png')

I have attached the bed file I outSampComp.db.zip used containing genomic information as well as the output of SampComp.

Also, after I run SampComp, it output there few files image

And when I check it in IGV it looks like this image

Are these sites significant positions (I draw a red arrow)?? Also, what are these kmers stand for (I draw a green circle around it)

lmulroney commented 1 year ago

Dear @XqKang,

Yes, those 5mers (red arrows) in the bed file should be the significant sites as defined by the test result threshold you inputted into sampcomp or the post analysis python api. A kmer is a notation to represent a certain number of nucleotide positions. The sensitive region of the R9.4.1 nanopore has more than 1 nucleotide in the sensitive region at any given time point. For direct RNA nanopore this is modelled as 5 nucleotides that contribute the most signal at any given time point. Due to this, we report the entire kmer (5mer for this version of the nanopore) as the sequence of nucleotides that could be modified (green circles). Usually the central nucleotide in the kmer is the modified position, but this is not necessarily true. I hope this helps with understanding the results from nanocompore. I suggest reading our recent protocol paper for a more thorough explanation on how to interpret the results (https://doi.org/10.1002/cpz1.683).

Cheers, Logan