I want to use the csa_wt index for counting DNA-substrings of variable size in a file containing DNA sequencing reads. The file looks like this, i.e. the reads are separated by newline characters:
Is there a way to tell csa_wt that suffixes spanning a newline character should be ignored in order to speed up the lookup and further reduce the size of the FM index?
Hello,
I want to use the csa_wt index for counting DNA-substrings of variable size in a file containing DNA sequencing reads. The file looks like this, i.e. the reads are separated by newline characters:
ACCGTATTTAGCACTGATCGATCGATC AAGGTCGATCGATCGATCACT AAACTACGATCGATCGTACATGCA
Is there a way to tell csa_wt that suffixes spanning a newline character should be ignored in order to speed up the lookup and further reduce the size of the FM index?