I'm trying to mask a reference file based on homopolymers. Upon searching for a while, I encountered this web page (https://gist.github.com/lh3/9d6dcfc3436a735ef197) and found out seqtk is able to output homopolymers.
However, seqtk hrun command only outputs hompolymers longer than 7 bps. Is there an option for adjusting this size? I'm trying to create a mask file with homopolymers both smaller and larger than 7 bps. (In fact, I'm thinking about dividing a reference file and applying different cutoff for homopolymer masking) For larger ones, I guess I could use awk to pipe and select homopolymers larger than some number, but I'm afraid this process might be slow.
Is seqtk hrun for a hidden usage only? If this is the case, could you recommend me another tool that can calculate homopolymer sites?
Hi, thank you for making such a great tool.
I'm trying to mask a reference file based on homopolymers. Upon searching for a while, I encountered this web page (https://gist.github.com/lh3/9d6dcfc3436a735ef197) and found out seqtk is able to output homopolymers.
However, seqtk hrun command only outputs hompolymers longer than 7 bps. Is there an option for adjusting this size? I'm trying to create a mask file with homopolymers both smaller and larger than 7 bps. (In fact, I'm thinking about dividing a reference file and applying different cutoff for homopolymer masking) For larger ones, I guess I could use awk to pipe and select homopolymers larger than some number, but I'm afraid this process might be slow.
Is seqtk hrun for a hidden usage only? If this is the case, could you recommend me another tool that can calculate homopolymer sites?
Thank you.