Closed Metag3nOmics closed 1 year ago
Hi shenwei356,
Thank you for your help and quick reply!
After implementing the old rush command the error does not occur again. However the directory _unfold_blastdb_fa.sh cannot be found. I followed the tutorial but the directory was never created. Do you have any suggestions on how to solve it? Should I just create an directory with that name?
Should I just create an directory with that name?
Create a file with that name with vi/vim/nano
, please.
The content is
#!/bin/sh
perl -e 'BEGIN{ $/ = "\n>"; <>; } while(<>){s/>$//; $i = index $_, "\n"; $h = substr $_, 0, $i; $s = substr $_, $i+1; if ($h !~ />/) { print ">$_"; next; }; $h = ">$h"; while($h =~ />([^ ]+ .+?) ?(?=>|$)/g){ $h1 = $1; $h1 =~ s/^\W+//; print ">$h1\n$s";} } '
Also make it executable:
chmod a+x xxx.sh
Hi shenwei356,
it worked! Thanks :)
However after running the part where the taxid is added
command: time pigz -cd nr.$id.with-taxid.part$i.fa.gz \ | seqkit replace -k $f -p "^([^-]+?) " -r "{kv}-\$1 " -K -U -o nr.$id.with-taxid.part$(($i+1)).fa.gz; /bin/rm nr.$id.with-taxid.part$i.fa.gz i=$(($i+1));
the error "flags -p (--pattern) needed" occurs.
Sorry for bothering you with this!
Kind regards.
Notice the commands you pasted above, some charactors after -k
are different from others. Please recopy from the tutorial page https://bioinf.shenwei.me/taxonkit/tutorial/#making-nr-blastdb-for-specific-taxids
I recopied this command from the tutorial, however it shows the same error. When I post the command github changes something. See below attached a screenshot of the command I used.
time pigz -cd nr.newlich.with-taxid.part$i.fa.gz | seqkit replace -k $f -p "^([^\-]+?) " -r "{kv}-\$1 " -K -U -o nr.newlich.with-taxid.part_$(($i+1)).fa.gz; '
I see, please make sure the value of $f
is not empty.
which value should I set? the acc2taxid.txt file?
for f in $id.acc2taxid.txt.part_* ; do
echo $f
time pigz -cd nr.$id.with-taxid.part$i.fa.gz \
| seqkit replace -k $f -p "^([^\-]+?) " -r "{kv}-\$1 " -K -U -o nr.$id.with-taxid.part$(($i+1)).fa.gz;
/bin/rm nr.$id.with-taxid.part$i.fa.gz
i=$(($i+1));
done
It should be a file matching $id.acc2taxid.txt.part_*
.
Hi shenwei356,
I followeg your suggestion. It worked but I'm running from one error into another.
time pigz -cd nr.newlich.with-taxid.part$i.fa.gz | seqkit replace -k $f -p "^([^\-]+?) " -r "{kv}-\$1 " -K -U -o nr.newlich.with-taxid.part$(($i+1)).fa.gz; [INFO] read key-value file: newlich.acc2taxid.txt.part_00 [INFO] 200000000 pairs of key-value loaded [ERRO] fastx: invalid FASTA/Q format
I used the files from the steps before, however it does not work :/
Please follow the error info and check the input file nr.newlich.with-taxid.part$i.fa.gz
.
Hi
I got the following error, when I'm trying to do the filtering from the tutorial (section 3a method 3)
rush: invalid option -- 'j'
Can you help me fix this?