miRTop / mirtop

command lines tool to annotate miRNAs with a standard mirna/isomir naming
https://mirtop.readthedocs.org
MIT License
18 stars 21 forks source link

Repeated isomiRs in the `mirtop.tsv` output #80

Open bounlu opened 4 months ago

bounlu commented 4 months ago

The mirtop.tsv output contains repeated rows for some isomiRs with identical first 12 columns:

UID  Read  miRNA  Variant  iso_5p  iso_3p  iso_add3p  iso_snp  iso_5p_nt  iso_3p_nt  iso_add3p_nt  iso_snp_nt
iso-16-03170FE  ACAGTAGTCTACACAT  hsa-miR-199b-3p  iso_3p:-6,iso_snv_central  0  -6  0  1  0  tggtta  0  11AG
iso-16-03170FE  ACAGTAGTCTACACAT  hsa-miR-199a-3p  iso_3p:-6,iso_snv_central  0  -6  0  1  0  tggtta  0  11AG
iso-16-03170FE  ACAGTAGTCTACACAT  hsa-miR-199a-3p  iso_3p:-6,iso_snv_central  0  -6  0  1  0  tggtta  0  11AG

Their counts in the samples differ slightly though. I would expect these isomiRs to be merged and counted together in the output. I understand from mirtop.gff that they come from different precursors with identical mature sequences, but then I don't understand why their counts are different in the same sample?

Is this a bug? If not, what is the reason behind repeated isomiRs with identical sequences but different counts in the same sample?

lpantano commented 1 week ago

Thanks for this, I think it would be good to see more. I see here there are 3 sequences, and 2 are annotated for 199a and another to 199b. Probably 199a has two precursors. Can you share as well the file where you say the counts are different? Are you referring to the isomiR counts or to the miRNA counts?