nanoporetech / remora

Methylation/modified base calling separated from basecalling.
https://nanoporetech.com
Other
156 stars 20 forks source link

input data for raw signal analysis #103

Closed ninavie closed 1 year ago

ninavie commented 1 year ago

Hi,

We are interested in the function of raw signal analysis with Remora. Specifically the command analyze plot ref_region. Becasue we are new to handling this type of data we first ran the command with the provided data (tests/data). In this case everything worked fine. After adding our own data we had multiple problems and errors.

We are not sure if our pre processed data fits the needs for this kind of analysis. Could you give us some explanation how to generate the following files correctly. can_mappings.bam mod_mappings.bam ref_regions.bed mod_gt.bed

Thanks for your help!

marcus1487 commented 1 year ago

The mappings.bam files were created with Dorado. In order to perform signal analyses the --emit-moves option must be provided. For reference anchored plots the reads must also be mapped, so the --reference argument should be specified to Dorado. If mapping reads after basecalling, special instructions are needed (see the data preparation section of the README) The ref_regions.bed files represented the regions of the reference at which you would like to plot signal. Note that these should be quite short to useful plots. I'd suggest starting with 10-30 bases. The mod_gt.bed files is the ground truth locations of modified bases in the mod_mappings.bam file. If you do not have a ground truth here then this file can be omitted.

If you could provide the specific errors that would be very helpful in assisting with resolving the issues.

ninavie commented 1 year ago

Hi @marcus1487 Thanks for your help!

This worked for us to make the analysis plots work within the notebook metrics_api.ipynb In case somebody else is faced with the same issue, here our pipeline for processing the input data (in addition to the .pod5 files):

  1. Basecalling with dorado dorado basecaller <model> <.pod5 directory> --emit-moves --reference <reference.fasta> > <output .bam>
  2. Sorting .bam file with samtools samtools sort <basecalled_mapped.bam> -o <sorted.bam>
  3. Creating .bai file samtools index <sorted.bam>