Closed ivan-aksamentov closed 3 years ago
This converts normal string sequences to a new custom string type, where every character is encoded using these tables:
https://github.com/neherlab/nextalign/blob/90840676937206feb9f39f72ebdf6681e5326bf4/packages/nextalign/src/alphabet/nucleotides.h#L8-L27
https://github.com/neherlab/nextalign/blob/90840676937206feb9f39f72ebdf6681e5326bf4/packages/nextalign/src/alphabet/aminoacids.h#L12-L42
This allows all used characters to be in a contiguous range and allows to use character codes as indices to the array when making match score lookups from these matrices:
https://github.com/neherlab/nextalign/blob/90840676937206feb9f39f72ebdf6681e5326bf4/packages/nextalign/src/matchNuc.cpp#L10-L30
https://github.com/neherlab/nextalign/blob/90840676937206feb9f39f72ebdf6681e5326bf4/packages/nextalign/src/matchAa.cpp#L10-L41
This is supposed to be much faster, because we are avoiding character conversion on every lookup, however this still needs to be benchmarked.
This converts normal string sequences to a new custom string type, where every character is encoded using these tables:
https://github.com/neherlab/nextalign/blob/90840676937206feb9f39f72ebdf6681e5326bf4/packages/nextalign/src/alphabet/nucleotides.h#L8-L27
https://github.com/neherlab/nextalign/blob/90840676937206feb9f39f72ebdf6681e5326bf4/packages/nextalign/src/alphabet/aminoacids.h#L12-L42
This allows all used characters to be in a contiguous range and allows to use character codes as indices to the array when making match score lookups from these matrices:
https://github.com/neherlab/nextalign/blob/90840676937206feb9f39f72ebdf6681e5326bf4/packages/nextalign/src/matchNuc.cpp#L10-L30
https://github.com/neherlab/nextalign/blob/90840676937206feb9f39f72ebdf6681e5326bf4/packages/nextalign/src/matchAa.cpp#L10-L41
This is supposed to be much faster, because we are avoiding character conversion on every lookup, however this still needs to be benchmarked.