prophyle / prophex

ProPhex – an exact k-mer index using Burrows-Wheeler Transform
MIT License
6 stars 1 forks source link

Paired-end reads #17

Open simonepignotti opened 6 years ago

simonepignotti commented 6 years ago

Add support for paired-end reads to the query command.

Updated specification

Usage:   prophex query [options] <index_prefix> <in1.fq> [in2.fq]
...

Behavior

Each pair should be concatenated and separated by a N character. The k-mers overlapping that position should have a specific marker in the output, e.g. C (concatenation).

Example

k=4

in1.fq:

@read1/1
ACGT
+
!!!!
...

in2.fq:

@read1/2
TGCA
+
!!!!
...

Extended Kraken format

output:

U    read1    0    8    ref1:1 C:4 ref1:1

Bitmask output format (#14 )

The hit and coverage masks should not contain the concatenation k-mers, but the two reads should be separated by a pipe (|).

read1   ref1    8   2   8   1|1

Alternative solutions

If there is a cleaner way to obtain the same result without concatenating reads with N, we should consider it (e.g. query the two parts indipendently).