Open indexofire opened 4 years ago
right now I use this revised mlst-download_pub_mlst nasty script to grab pubmlst data.
#!/bin/bash
set -e
OUTDIR=pubmlst
mkdir -p "$OUTDIR"
wget --no-clobber -P "$OUTDIR" http://pubmlst.org/data/dbases.xml
for URL in $(grep '<url>' $OUTDIR/dbases.xml); do
# echo $URL
URL=${URL//<url>}
URL=${URL//<\/url>}
# echo ${URL: -4}
if [ ${URL:(-4)} = "_csv" ]; then
#PROFILE=$(basename $URL .txt)
PROFILE=$(echo $URL | awk -F'_' '{print $2}')
NUM=$(echo $URL | awk -F'/' '{if($7!=1)print "_"$7}')
echo "# $PROFILE "
PROFILEDIR="$OUTDIR/$PROFILE$NUM"
echo "mkdir -p '$PROFILEDIR'"
echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$PROFILE$NUM.txt')"
elif [ ${URL:(-6)} = "_fasta" ]; then
ALLELE=$(echo $URL | awk -F'/' '{print $7}')
echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"
fi
done
# delete fungi schemes
echo rm -frv "$OUTDIR"/{afumigatus,blastocystis,calbicans,cglabrata,ckrusei}
echo rm -frv "$OUTDIR"/{ctropicalis,csinensis,kseptempunctata,sparasitica,tvaginalis}
Thank you so much, @indexofire!
Any chance we'll see these updates reflected in the repo @tseemann?
Grateful for all your guys hard work and dedication, I know we've all got a lot on our plates right now.
To make sure 'mlst-make_blast_db' functions correctly, one suggestion I would make here is to alter:
echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE')"
to
echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"
Cheers!
To make sure 'mlst-make_blast_db' functions correctly, one suggestion I would make here is to alter:
echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE')"
to
echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"
Cheers!
The nasty script still does not create the same scheme name in pubmlst folder as the original one beacuse of the change of dbases.xml . Hope that's OK for users.
Hi, For me the script didn't work. The subfolders were not created and the files not downloaded. I had to change the echo command into an eval command. Any explanations for an old DOS-user?
Here the lines I changed:
eval "mkdir -p '$PROFILEDIR'" eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$PROFILE$NUM.txt')" elif [ ${URL:(-6)} = "_fasta" ]; then ALLELE=$(echo $URL | awk -F'/' '{print $7}') eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$ALLELE.tfa')"
Hi, For me the script didn't work. The subfolders were not created and the files not downloaded. I had to change the echo command into an eval command. Any explanations for an old DOS-user?
Here the lines I changed:
eval "mkdir -p '$PROFILEDIR'" eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$PROFILE$NUM.txt')" elif [ ${URL:(-6)} = "_fasta" ]; then ALLELE=$(echo $URL | awk -F'/' '{print $7}') eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$ALLELE.tfa')"
This change works for me!!! Also thank you all of you for this fix!!!!!
Hi everyone. It looks like pubMLST has changed something again, so now a new fix is needed, because there is two different URLs to download schemes and sequences, so should be like this:
#!/bin/bash
set -e
OUTDIR=pubmlst
mkdir -p "$OUTDIR"
wget --no-clobber -P "$OUTDIR" http://pubmlst.org/data/dbases.xml
for URL in $(grep '<url>' $OUTDIR/dbases.xml); do
# echo $URL
URL=${URL//<url>}
URL=${URL//<\/url>}
# echo ${URL: -4}
if [ ${URL:(-4)} = "_csv" ]; then
#PROFILE=$(basename $URL .txt)
PROFILE=$(echo $URL | awk -F'_' '{print $2}')
if [ $(echo $URL | awk -F'/' '{print $3}') = "rest.pubmlst.org" ]; then
NUM=$(echo $URL | awk -F'/' '{if($7!=1) print "_"$7}')
else
NUM=$(echo $URL | awk -F'/' '{if($8!=1) print "_"$8}')
fi
echo "# $PROFILE "
PROFILEDIR="$OUTDIR/$PROFILE$NUM"
eval "mkdir -p '$PROFILEDIR'"
eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$PROFILE$NUM.txt')"
elif [ ${URL:(-6)} = "_fasta" ]; then
if [ $(echo $URL | awk -F'/' '{print $3}') = "rest.pubmlst.org" ]; then
ALLELE=$(echo $URL | awk -F'/' '{print $7}')
else
ALLELE=$(echo $URL | awk -F'/' '{print $8}')
fi
eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"
fi
done
# delete fungi schemes
echo rm -frv "$OUTDIR"/{afumigatus,blastocystis,calbicans,cglabrata,ckrusei}
echo rm -frv "$OUTDIR"/{ctropicalis,csinensis,kseptempunctata,sparasitica,tvaginalis}
I hope this could save time and suffering
The dbases.xml updated suffix path of profile and sequence. it's csv and fasta now instead of txt and tfa. the mlst-download-pub-mlst script should be update.