tomazc / iCount

iCount, protein-RNA interaction analytics
http://icount.biolab.si
Other
23 stars 26 forks source link

confusion between --mapq_th and --multimax in iCount xlsites #179

Closed YannAudic closed 6 years ago

YannAudic commented 6 years ago

Hi, this is just a comment/question regarding the two parameters --mapq_th and --multimax.

because the reads are mapped with STAR, the MAPQ field describes the number of hits for a given read. Therefore --mapq_th is not controlling quality but multimapped reads threshold (the different threshold are clearly indicated in the python script). This could be made clearer in the iCount xlsite --help " --mapq_th Ignore hits with MAPQ < mapq_th (default: 0)" " --mapq_th Ignore hits with MAPQ < mapq_th (default: 0), this controls mutimapped reads"

Despite being able to control the multimapped reads with the --mapq_th the two outputed bed files are identical while it is stated in the --help that "check overlap between unique and multimap BED files, should be small,"

diff -s XCLIP_23847_NNNTTGTNN_iCount_noseg_mpqth255_xlinkedsites_*
Files XCLIP_23847_NNNTTGTNN_iCount_noseg_mpqth255_xlinkedsites_multi.bed and XCLIP_23847_NNNTTGTNN_iCount_noseg_mpqth255_xlinkedsites_unique.bed are identical

On the other hand, the --multimax parameter does not seem to filter on the number of multimapping allowed. Which field of the Bam/SAM is it filtering ?

Do I miss something here ?

thanks, Yann

JureZmrzlikar commented 6 years ago

Parameter --mapq_th filters BAM entries by quality. Mapping quality is a general property of any alignment file, not just the ones produced by STAR. But direct connection between number of mapped reads and mapping quality is only applicable for STAR. The aim of this parameter is to filter out alignments with bad quality, irrespective of the aligner by which they were produced.

Function of multimax parameter should be self-evident: to filter out alignments, mapped to more than multimax places. However, the documentation is outdated. The "multi" file contains uniquely AND multimapped reads.

In your particular case, i would assume two options:

JureZmrzlikar commented 6 years ago

Aha, i forgot to explain: number of places that a read was mapped to is determined by NH tag.