rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
135 stars 49 forks source link

Mapper block question #52

Closed JoseCorCab closed 4 years ago

JoseCorCab commented 4 years ago

Hello, I am trying to incorporate miRDeep2 to our workflow, parallel to a RNAseq analysis. I performed an execution using default mapper.pl and mirdeep.pl scripts and it works. However, our workflow generate a report file including information about RNAseq samples status and mapping metrics, and we use a BAM file to extact some of these metrics. I wish to extract the same information from mapper.pl output or its temporary files, but I have found that it uses non redundant fasta file (collapsed_reads.fa) as bowtie input and extracted metrics are biased by this, so I need to map all raw redundant reads onto genome generating a SAM or BWT, then convert the output file to BAM and extract all information, and finally generate a "collapsed" ARF for use mirdeep2.pl script. I have also seen that you can use as input a BWA SAM file but it is not clear in documentation if BWA mapping can be performed using all raw reads or collapsed reads:

The user has already removed 3' adapters in color space and has mapped the reads against the genome using the BWA tool. The BWA output file is named reads_vs_genome.sam. Notice that the BWA output contains extra fields that are not required for SAM format. Our converter requires these fields and thus may not work with all types of SAM files.

After that you specified that reads_collapsed.fa and reads_vs_genome.arf must be generated to use mirdeep.pl

The user wishes to generate reads_collapsed.fa and reads_vs_genome.arf to input to miRDeep2

So I don't really know if bwa_sam_converter.pl script collapse needs a non-collapsed fasta mapped to genome and it generates a collapsed ARF file, or it needs to collapse fasta file before BWA mapping.

Could you explain me more specifically this performance?

Another tangential doubt is when you specified that:

Notice that the BWA output contains extra fields that are not required for SAM format. Our converter requires these fields and thus may not work with all types of SAM files.

Which extra fields do bwa_sam_converter.pl need from BWA SAM?

Thanks.

mschilli87 commented 4 years ago

@JoseCorCab: This is an issue tracker for bugs in miRDeep2. I don't see a bug reported so I'm going to close this issue. That said, I suggest to convert the ARF you get from mapper.pl to BAM using you method of choice, copying each SAM alignment as many times a indicated by the multiplicity in the collapsed read ID and extract your metrics from the resulting BAM file. There should be no need to map the identical reads hundreds or thousands of times to the same reference again and again just to collect some pupulations statistics.