torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
121 stars 23 forks source link

Invalid numeric argument for option -t or --threads #173

Closed Gian77 closed 1 year ago

Gian77 commented 1 year ago

Hello,

I am tried to dereplicate reads using the last version of swarm. I am running swarm on a cluster computer using 128 cores/threads and I am getting this erroor message

Invalid numeric argument for option -t or --threads

This is how I tried to run the dereplciation for each file I have

swarm \
    -threads $cores \
    --differences 0 \
    -w $file \
    -o /dev/null ${file}_linear.temp

Thanks a lot! Gian

torognes commented 1 year ago

You need to write either -t $cores (single dash) or --threads $cores (double dash), not -threads $cores.

frederic-mahe commented 1 year ago

commit e832af4b113558f77125af94eb69702577c17f3d tries to address that issue by outputting additional hints.

# missing space
swarm --threads 1-f

# or missing dash
swarm -threads 1

now produce the following error message:

Error: Invalid numeric argument for option -t or --threads.

Frequent causes are:
 - a missing space between an argument and the next option,
 - a long option name not starting with a double dash
   (swarm accepts '--help' or '-h', but not '-help')

Please see 'swarm --help' for more details.
frederic-mahe commented 1 year ago

I've pushed basic tests to cover that issue. Let me know what you think of this error message. In the meantime, I am going to close that issue.

torognes commented 1 year ago

I like it!

Gian77 commented 1 year ago

Hello @frederic-mahe @torognes,

Thanks for the fast reply. Oops... my bad, I misse the - ... Sorry!

More informative error messages are alwasy helpful, but I know it is hard to implement these details... You are forgiven :P

So, I also linearized the sequences before running swarm as below and it worked.

for file in *.fasta
do
    echo $file  
    awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' < \
    $file | tr "\t" "\n" | sed -e 's/\( \).*\(;.\)/\1\2/' | sed 's/ //' | sed 's/.$//' > \
    $project_dir/outputs/10_dereplicateReads_usearch/${file%.*}_linear.temp

and then

for file in *.temp
do
    conda activate swarm3
    swarm \
    --threads $cores \
    --differences 0 \
    -w $project_dir/outputs/10_dereplicateReads_usearch/${file//.temp/.fasta} \
    -z \
    -o /dev/null $file
    conda deactivate
done

Thansk a lot! Gian

frederic-mahe commented 1 year ago

hello @Gian77 sequence linearization is not strictly necessary for swarm, so I should remove that section from the README file. What is important is dereplication. Swarm expects an abundance value in each sequence header of your fasta file. You'll get an error message if it is not the case.

Gian77 commented 1 year ago

@frederic-mahe , good to know that linearization isn't needed, I needed to change the headers a little, anyways.... And, yes, I noticed that swarm has a stricter dereplication compared to USEARCH (we have a license here, I have to use it - sorry :P). I think you can close this if it isn't yet. Best, Gian

Gian77 commented 1 year ago

@frederic-mahe Awesome, thanks again for explanation - and for including the more interpretable error message.