layaasiv / Demultiplex

Index-based separation of multiplexed sequencing data.
0 stars 0 forks source link

Feedback for part 2 #3

Open rlancaster96 opened 1 year ago

rlancaster96 commented 1 year ago

Hi Layaa, I think you have a great start here and I appreciate how clearly you've written your pseudocode. It makes it easy to follow your logic. One thing I noticed is that you might be skipping an important quality filtering step. You've assigned index 1 ("index1") and index 2 ("rc_index2") to variables. However, it looks like your "elif" for quality filtering only looks at "N"s in the string of rc_index2. You may want to also have it look for "N"s in index1 as well. Either index could be affected by this, and index1 and rc_index2 are physically separate reads, so you should try to capture both in your quality pass.

*edit to clarify: For example, following your code through with index1 = NACT and index2 = AGTC index1=NACT rc_index2= GACT

if rc_index2 contains N, write to unknown [skipped, rc_index2 contains no N] elif rc_index2 == index1 and quality score and index 1 in possible indexes, write to matched [skipped, index1 not in possible indexes, index1 != rc_index2] else write to unmatched Written to your swapped indexes file, but it should belong in the unknown/low quality file instead.

layaasiv commented 1 year ago

Ruben, That's a good point! Thank you explaining it so clearly.