lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.37k stars 308 forks source link

subseq ignores strand from BED file #71

Open RoyChaudhuri opened 8 years ago

RoyChaudhuri commented 8 years ago

seqtk subseq seems to ignore strand information from the BED file. I would expect the reverse complemented sequence to be returned for BED features on the - strand.

Observed behaviour:

$ cat test.ntd 
>id
ACTGACTGAC
$ cat test.bed
id      3       6       name    1000    +
id      3       6       name    1000    -
$ seqtk subseq test.ntd test.bed
>id:4-6
GAC
>id:4-6
GAC

Expected behaviour:

$ seqtk subseq test.ntd test.bed
>id:4-6
GAC
>id:complement(4-6)
GTC
tseemann commented 8 years ago

This is a good point. I guess the problem stems from classic BED being only 3 columns.

This currently behaves like samtools faidx ref.fa id:4-6.

lh3 commented 8 years ago

Fixed via e5c4fd9. Thanks.

Sorry. Closed a wrong thread.