[ ] 2. split single large database sequences into multiple parts (https://github.com/eaasna/sliding-window/pull/27) and handle them in parallel (parallel over all the segments of one or more database sequences, in particular a single really large database which couldn't be parallelized otherwise)
[ ] this will give for each query_id a set of database_id's
[ ] we need ot convert that to a set of query_id's for each database_id (this is solved by the shopping cart queue)
[ ] 4. change the "taxonomy" what a database sequence is (it can be a database segment, i.e. database ID + start position + end position) for example |AAAA|AAAA|A has 3 segments of same database sequence (slightly overlapping)
(one large reference (e.g. 1MB) database sequence against a lot of small query sequences (short reads, e.g. 100B))
Note: This is a combination of the 2. and 3. Problem
After that Long Term:
[ ] split query sequence into segments (slightly overlapping) and ask the IBF which segments would produce a eps-match; and only build the SWIFT-Filter for all the intresting query segments.
(one large reference (e.g. 1MB) database sequence against a single large query sequence (e.g. 1MB))
Mind Example:
AAAAAAAAAAAAAAAA|AAAAA]AAAAAAAA|AAAA]AAA
DatabaseSegment1 -> Database1
DatabaseSegment2 -> Database1 (IBF could just say this one)
Things to check:
sha256sum
verified, i.e.make datasets
)Things to do:
raptor build
we might want to havedream_stellar build
)After that Long Term:
Mind Example: AAAAAAAAAAAAAAAA|AAAAA]AAAAAAAA|AAAA]AAA