In the test sequence I used for #100 I noticed the following bug: after reverse-complementing a sequence, the RBS spacer for one of the predicted gene was changing when the contig was reverse-complemented:
Indeed, the gene with the GGA/GAG/AGG RBS motif has a spacer detected as 3-4bp when on the forward strand, and 5-10bp on the reverse strand. The contig in question starts with the following sequence:
GGATAGGCCCCATG...
so it has both a match in the 3-4bp range (AGG) and in the 5-10bp range (GGA), but since the 5-10bp spacer has a higher score it should be the one to be selected. This actually matters on the gene score, so it could cause some predictions to change.
The problem was coming from the loops in rbs_score which skip some positions before index 0; however, when there may be a partial match (as it is the case here, with a GGA motif right on the contig edge), the positions should not be skipped, and the decision to ignore some positions should be made by the shine_dalgarno_exact and shine_dalgarno_mm functions directly.
After applying the patch, the predictions are consistent independent of the directionality of the contig, the RBS spacers and hence the gene scores match:
Hi, one final PR :smiley:
In the test sequence I used for #100 I noticed the following bug: after reverse-complementing a sequence, the RBS spacer for one of the predicted gene was changing when the contig was reverse-complemented:
Indeed, the gene with the
GGA/GAG/AGG
RBS motif has a spacer detected as3-4bp
when on the forward strand, and5-10bp
on the reverse strand. The contig in question starts with the following sequence:so it has both a match in the
3-4bp
range (AGG
) and in the5-10bp
range (GGA
), but since the5-10bp
spacer has a higher score it should be the one to be selected. This actually matters on the gene score, so it could cause some predictions to change.The problem was coming from the loops in
rbs_score
which skip some positions before index0
; however, when there may be a partial match (as it is the case here, with aGGA
motif right on the contig edge), the positions should not be skipped, and the decision to ignore some positions should be made by theshine_dalgarno_exact
andshine_dalgarno_mm
functions directly.After applying the patch, the predictions are consistent independent of the directionality of the contig, the RBS spacers and hence the gene scores match: