smithlabcode / falco

A C++ drop-in replacement of FastQC to assess the quality of sequence read data
https://falco.readthedocs.io
GNU General Public License v3.0
90 stars 10 forks source link

adapters with different length or length more then 32 https://github.… #28

Closed Shelestova-Anastasia closed 2 years ago

Shelestova-Anastasia commented 2 years ago

In the case adapters have different length (or adapters have length more then 32) - use slow inefficient search as fastqc does. I understand - it's inefficient solution, but it works as fastqc. It's better then not processing samples at all.

May be aho-corasick algorithm will be good for this feature, but for now I have no time for it.

Also I fixed current algorithm for adapters with 32 length - adapter mask was calculated as 0. Now it's max size_t for such adapters.

guilhermesena1 commented 2 years ago

Thanks for the PR!

Just to confirm, by any chance did you compare the speed with this adapter checking procedure vs the old code? Just wondering if the extra check for !adapter_search_slow has any visible effect.

Once again really appreciate the help!

Shelestova-Anastasia commented 2 years ago

@guilhermesena1 I changed flag to do_adapter_optimized for sliding window - got rid of extra check.