shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
369 stars 29 forks source link

Error rush: invalid option -- 'j' #69

Closed Metag3nOmics closed 1 year ago

Metag3nOmics commented 1 year ago

Hi

I got the following error, when I'm trying to do the filtering from the tutorial (section 3a method 3)

rush: invalid option -- 'j'

Can you help me fix this?

shenwei356 commented 1 year ago

The rush in the tutorial is this one, while what you're running is this one.

You need to download the former one and rename it and run with the new name.

Metag3nOmics commented 1 year ago

Hi shenwei356,

Thank you for your help and quick reply!

After implementing the old rush command the error does not occur again. However the directory _unfold_blastdb_fa.sh cannot be found. I followed the tutorial but the directory was never created. Do you have any suggestions on how to solve it? Should I just create an directory with that name?

shenwei356 commented 1 year ago

Should I just create an directory with that name?

Create a file with that name with vi/vim/nano, please.

The content is

#!/bin/sh
perl -e 'BEGIN{ $/ = "\n>"; <>; } while(<>){s/>$//;  $i = index $_, "\n"; $h = substr $_, 0, $i; $s = substr $_, $i+1; if ($h !~ />/) { print ">$_"; next; }; $h = ">$h"; while($h =~ />([^ ]+ .+?) ?(?=>|$)/g){ $h1 = $1; $h1 =~ s/^\W+//; print ">$h1\n$s";} } '

Also make it executable:

chmod a+x xxx.sh
Metag3nOmics commented 1 year ago

Hi shenwei356,

it worked! Thanks :)

However after running the part where the taxid is added

command: time pigz -cd nr.$id.with-taxid.part$i.fa.gz \ | seqkit replace -k $f -p "^([^-]+?) " -r "{kv}-\$1 " -K -U -o nr.$id.with-taxid.part$(($i+1)).fa.gz; /bin/rm nr.$id.with-taxid.part$i.fa.gz i=$(($i+1));

the error "flags -p (--pattern) needed" occurs.

Sorry for bothering you with this!

Kind regards.

shenwei356 commented 1 year ago

Notice the commands you pasted above, some charactors after -k are different from others. Please recopy from the tutorial page https://bioinf.shenwei.me/taxonkit/tutorial/#making-nr-blastdb-for-specific-taxids

Metag3nOmics commented 1 year ago

I recopied this command from the tutorial, however it shows the same error. When I post the command github changes something. See below attached a screenshot of the command I used.

image

Metag3nOmics commented 1 year ago

time pigz -cd nr.newlich.with-taxid.part$i.fa.gz | seqkit replace -k $f -p "^([^\-]+?) " -r "{kv}-\$1 " -K -U -o nr.newlich.with-taxid.part_$(($i+1)).fa.gz; '

shenwei356 commented 1 year ago

I see, please make sure the value of $f is not empty.

Metag3nOmics commented 1 year ago

which value should I set? the acc2taxid.txt file?

shenwei356 commented 1 year ago
for f in $id.acc2taxid.txt.part_* ; do
    echo $f
    time pigz -cd nr.$id.with-taxid.part$i.fa.gz \
        | seqkit replace -k $f -p "^([^\-]+?) " -r "{kv}-\$1 " -K -U -o nr.$id.with-taxid.part$(($i+1)).fa.gz;
    /bin/rm nr.$id.with-taxid.part$i.fa.gz
    i=$(($i+1));
done

It should be a file matching $id.acc2taxid.txt.part_*.

Metag3nOmics commented 1 year ago

Hi shenwei356,

I followeg your suggestion. It worked but I'm running from one error into another.

time pigz -cd nr.newlich.with-taxid.part$i.fa.gz | seqkit replace -k $f -p "^([^\-]+?) " -r "{kv}-\$1 " -K -U -o nr.newlich.with-taxid.part$(($i+1)).fa.gz; [INFO] read key-value file: newlich.acc2taxid.txt.part_00 [INFO] 200000000 pairs of key-value loaded [ERRO] fastx: invalid FASTA/Q format I used the files from the steps before, however it does not work :/

shenwei356 commented 1 year ago

Please follow the error info and check the input file nr.newlich.with-taxid.part$i.fa.gz.