molgenis / VaSeBuilder

Validation Set Builder
GNU Lesser General Public License v3.0
1 stars 3 forks source link

Reverse acceptor filtering logic to increase speed. #61

Closed TDMedina closed 5 years ago

TDMedina commented 5 years ago

When writing the acceptor FastQ reads to the new combined FastQ, the variant context reads are filtered out. But I just noticed that the logic is:

if this_read not in reads_to_filter:
    write_the_thing()
else:
    next()

This means that we have to check EVERY read in the filter list every single time to make sure that NONE of them are a match. If we reverse the logic like this:

if this_read in reads_to_filter:
    next()
else:
    write_the_thing()

Then I would guess that the search through the filter list could exit as soon as it finds a match. It would probably slightly improve speed by like 2 nanoseconds.

MatthieuBeukers commented 5 years ago

Is this in build_fastqs_from_donors() ?

TDMedina commented 5 years ago

@MatthieuBeukers pointed out that not in may still search through the entire list for a match, then simply inverse the true/false result, meaning there is no speed improvement. Need to check this.

Also, I think reads_to_filter is a set, not a list, so the search isn't linear and speed won't improve either way.

Also, yes:

Is this in build_fastqs_from_donors() ?