mhalushka / miRge3.0

Comprehensive analysis of small RNA sequencing data
MIT License
27 stars 12 forks source link

Trimming repeated nucleotides #96

Open adamcatto opened 4 months ago

adamcatto commented 4 months ago

Is there an option in miRge to trim off nucleotides that are repeated more than k times? e.g. ACGT[A* >= k]TGCA gets trimmed to ACGTTGCA. I know this can be done prior to running the miRge pipeline but it would be nice to just include as an argument in the miRge run script.

arunhpatil commented 4 months ago

Hi @adamcatto,

Thank you for your suggestion, I don't see why this makes an improvement in the current pipeline and/or its benifits over all. We don't currently have the option of removing internal repeated nucleotides. (Please expect delays due to travel and will be back on May 04 EST).

Thank you, Arun.

adamcatto commented 4 months ago

I think for some reads there may be strings of identical nucleotides that are technical artifacts which should be removed. In any case, I have forked the repository and added an option to remove repeated nucleotides ≥ a given length. You can view the changes here if it sounds interesting: https://github.com/adamcatto/miRge3.0/commit/8709dfe44c22d743e358484ad2e889966b8786cf