Open helloworldABCD1234 opened 2 weeks ago
This is really a question about scoring matrices/gap parameters more than about rmblastn. RepeatMasker uses scoring matrices in which a substitution from N to any other base is slightly penalized (-1 ). This will easily align bases to the Ns for short distances in the cases where they correctly span between two non-N strings, and will terminate alignment if they are too long ( perhaps generating another alignment for the non-N sequence following it ). The gap open/extension penalties also play in to this. They are much higher than the N substitution penalty and therefore will not often span the N's with a gap.
For example, if I use your example and an absurdly low cutoff score, I get the following with a similar matrix/gap parameters:
72 0.00 0.00 0.00 t3 1 11 (0) t1 1 11 (2)
t3 1 ATCGGGCTTTT 11
??
t1 1 ATCGGGCTNNT 11
Does that answer your question?
Will N have an impact on blast n if RepeatMask uses hard masking and generates N from repeated sequences? For example, is ATCGGGCTNNTTT the same sequence as ATCGGGCTTTT? Or is it true that ATCGGGCTNNNNTTT and ATCGGGCTTTTT have the same effect in inputting blastn