shenwei356 / seqkit

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
https://bioinf.shenwei.me/seqkit
MIT License
1.32k stars 159 forks source link

seqkit amplicon producing no output/stalling #194

Closed Rob-murphys closed 3 years ago

Rob-murphys commented 3 years ago

Prerequisites

seqkit/0.13.2

Describe your issue

seqkit amplicon output this single line on usage and generates no output: [INFO]ESC[0m 1 primer pair loaded

Input

cat bgc_diversity/sequence/RB110-1.fasta | seqkit amplicon -F GCSTACSYSATSTACACSTCSGG -R SASGTCVCCSGTSCGGTA -o bgc_diversity/sequence/seqkit_output/RB110-1.bed --bed

shenwei356 commented 3 years ago

I guess no sequences matched.

Rob-murphys commented 3 years ago

This would be very odd as they are degenerate primers for domains of NRPS biosynthetic gene clusters which I know to already exist in the sequence.

Is [INFO]ESC[0m 1 primer pair loaded a normally expected output line?

shenwei356 commented 3 years ago

What operating system and shell/terminal/console are you using? [INFO] 1 primer pair loaded is log message outputted in STDERR and it won't be written into the .bed file. The logging is compatible with Linux/macOS/Windows, bash/zsh, and it should be displayed like this

[INFO] 1 primer pair loaded

Please update seqkit to the latest version (v0.15.0) or pre-release of v0.15.1 below:

Seqkit recognizes degenerate bases. Could you please upload few sequences? so I can find out the reason.

Rob-murphys commented 3 years ago

This is from the STDERR output from the linux cluster I work on.

I just tried updating to version v0.15.0 with conda but get the same results as posted above:

(seqkit) [robmur@g-12-l0002 seq_extraction]$ conda list
# packages in environment at /home/projects/ku_00014/people/robmur/programmes/miniconda3/envs/seqkit:
#
# Name                    Version                   Build  Channel
seqkit                    0.15.0                        0    bioconda

Here is the sequence I am using: RB110-1.zip

shenwei356 commented 3 years ago

Every thing is right for me. If you do not want the log message, use flag --quiet.

And if you use specifiy the output file with flag -o xxx.bed, then there's no output to STDOUT, it's in the bed file.

Even if using a cluster job management system, normal output should in STDOUT file not STDERR.


$ seqkit amplicon -F GCSTACSYSATSTACACSTCSGG -R SASGTCVCCSGTSCGGTA  RB110-1.fasta  --bed --quiet -o a.bed

$ seqkit amplicon -F GCSTACSYSATSTACACSTCSGG -R SASGTCVCCSGTSCGGTA  RB110-1.fasta  --bed --quiet \
    | cut -f 1-6
tig00000002_np1212      610358  611093  .       0       +
tig00000004_np1212      628583  1390091 .       0       +
tig00000004_np1212      1605438 4766774 .       0       -
Rob-murphys commented 3 years ago

No .bed file output is generated for me in the specificed location.

I am passing a filepath to -o also: bgc_diversity/sequence/seqkit_output/RB110-1.bed

Is that allowed?

I notice you have the --bed flag before the -o flag, could that be causing the issue?

EDIT:

The order of the --bed flag and not using cat foo.fasta | seqkit amplicon seems it have fixed it! Now to understand the output, is there anywhere that details what each column is?

Contig, Start, End, unsure, unsure, Sense strand, Amplicon

This is what i assume they stand for?

shenwei356 commented 3 years ago

BED format. The 4th (name) and 5th column (score) is set to the default values "." and "0".

Positions of flags do not affect the result.

cat xxx.fa | seqkit amplicon should be the same with seqkit amplicon xxx.fa.