lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.35k stars 310 forks source link

seqtk hrun bin size #185

Open kimin0402 opened 2 years ago

kimin0402 commented 2 years ago

Hi, thank you for making such a great tool.

I'm trying to mask a reference file based on homopolymers. Upon searching for a while, I encountered this web page (https://gist.github.com/lh3/9d6dcfc3436a735ef197) and found out seqtk is able to output homopolymers.

However, seqtk hrun command only outputs hompolymers longer than 7 bps. Is there an option for adjusting this size? I'm trying to create a mask file with homopolymers both smaller and larger than 7 bps. (In fact, I'm thinking about dividing a reference file and applying different cutoff for homopolymer masking) For larger ones, I guess I could use awk to pipe and select homopolymers larger than some number, but I'm afraid this process might be slow.

Is seqtk hrun for a hidden usage only? If this is the case, could you recommend me another tool that can calculate homopolymer sites?

Thank you.

CharlotteAnne commented 11 months ago

Hi - would be great to have an answer on this if not already answered elsewhere as I'm also wondering about this?