shenwei356 / seqkit

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
https://bioinf.shenwei.me/seqkit
MIT License
1.31k stars 159 forks source link

Feature request: New switches for amplicon & stat? #310

Closed photocyte closed 2 years ago

photocyte commented 2 years ago

Prerequisites

Describe your issue

Hi there, thanks again for creating seqkit, I always find it really useful.

I have two feature requests:

(1) seqkit amplicon having a new switch, where it puts the string of -F and -R into the output FASTA record description

seqkit amplicon -NEWSWITCH -s -F GGCTTGCGCAACATGAAGAA -R GCACGTGAGAAGTGAGGACA example.fasta outputs:

>record_id GGCTTGCGCAACATGAAGAA_and_GCACGTGAGAAGTGAGGACA_amplicon
GGCTTGCGCA...

(2) seqkit stat having a setting, where it puts the seqkit stat results into the output FASTA record description.

seqkit stat -NEWSWITCH example.fasta outputs:

>record_id sum_len=460
GGCTTGCGCA...
shenwei356 commented 2 years ago

(1) seqkit amplicon puts the string of -F and -R into the output FASTA record description

It's simple.

(2) seqkit stats puts the seqkit stat results into the output FASTA record description

It's not what stats should do. And it's simple to to:

$ echo -ne ">s\nactg\n"
>s
actg

$ echo -ne ">s\nactg\n" \
    | seqkit fx2tab -l -Q \
    | awk '{print $1" len="$3"\t"$2;}' \
    | seqkit tab2fx 
>s len=4
actg
photocyte commented 2 years ago

For (1), I found that isPcr (https://bioconda.github.io/recipes/ispcr/README.html) could do it. isPcr also reports all the amplicons, rather than just the longest like seqkit amplicon. For my application, I needed an in-silico PCR program that reports all the amplicons, so the (1) feature request is no longer needed.

For (2), I agree, it does seem to be a big change in the output style of seqkit stat . So, I think (2) is no longer needed either. Please close this issue if you agree.