lenarayneallen / Deduper-lenarayneallen

0 stars 0 forks source link

Deduper Pseudocode Review - Abraham #2

Open asolomon4146 opened 3 weeks ago

asolomon4146 commented 3 weeks ago

Hi Lena,

Great job with this pseudocode. The format is very clean and consistent and as a result, easy to comprehend!

Your function definitions all make sense and I have almost no comments on them.

In your pseudocode, current_chr doesn't appear to have an initialization which made me wonder what it was.

In your main two for loops, you loop through the entire sam file for every UMI. Because sam files can be large this can be inefficient and might take too long. You might consider processing each line as you read it and storing the information in like a set or something so that you can check the next line to see if the next line you process has been found in the set so you can ignore it (deduplicate). This is just another potential strategy though and probably not needed. I think yours will work just fine!

Again great job with this, really beautiful looking.

lenarayneallen commented 2 weeks ago

Thank you so much for the feedback Abraham! I agree that looping through the entire SAM file for every UMI is super inefficient; I will definitely keep this in mind when writing my actual code!!