tseemann / mlst

:id: Scan contig files against PubMLST typing schemes
GNU General Public License v2.0
201 stars 47 forks source link

dbases.xml updated suffix #98

Open indexofire opened 4 years ago

indexofire commented 4 years ago

The dbases.xml updated suffix path of profile and sequence. it's csv and fasta now instead of txt and tfa. the mlst-download-pub-mlst script should be update.

indexofire commented 4 years ago

right now I use this revised mlst-download_pub_mlst nasty script to grab pubmlst data.

#!/bin/bash

set -e

OUTDIR=pubmlst
mkdir -p "$OUTDIR"
wget --no-clobber -P "$OUTDIR" http://pubmlst.org/data/dbases.xml

for URL in $(grep '<url>' $OUTDIR/dbases.xml); do
#  echo $URL
  URL=${URL//<url>}
  URL=${URL//<\/url>}
#  echo ${URL: -4}
  if [ ${URL:(-4)} = "_csv" ]; then
    #PROFILE=$(basename $URL .txt)
    PROFILE=$(echo $URL | awk -F'_' '{print $2}')
    NUM=$(echo $URL | awk -F'/' '{if($7!=1)print "_"$7}')
    echo "# $PROFILE "
    PROFILEDIR="$OUTDIR/$PROFILE$NUM"
    echo "mkdir -p '$PROFILEDIR'"
    echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$PROFILE$NUM.txt')"
  elif [ ${URL:(-6)} = "_fasta" ]; then
    ALLELE=$(echo $URL | awk -F'/' '{print $7}')
    echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"
  fi
done

# delete fungi schemes
echo rm -frv "$OUTDIR"/{afumigatus,blastocystis,calbicans,cglabrata,ckrusei}
echo rm -frv "$OUTDIR"/{ctropicalis,csinensis,kseptempunctata,sparasitica,tvaginalis}
tdcollingsworth commented 4 years ago

Thank you so much, @indexofire!

Any chance we'll see these updates reflected in the repo @tseemann?

Grateful for all your guys hard work and dedication, I know we've all got a lot on our plates right now.

tdcollingsworth commented 4 years ago

To make sure 'mlst-make_blast_db' functions correctly, one suggestion I would make here is to alter:

echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE')"

to

echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"

Cheers!

indexofire commented 4 years ago

To make sure 'mlst-make_blast_db' functions correctly, one suggestion I would make here is to alter:

echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE')"

to

echo "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"

Cheers!

The nasty script still does not create the same scheme name in pubmlst folder as the original one beacuse of the change of dbases.xml . Hope that's OK for users.

safrye commented 4 years ago

Hi, For me the script didn't work. The subfolders were not created and the files not downloaded. I had to change the echo command into an eval command. Any explanations for an old DOS-user?

Here the lines I changed:

eval "mkdir -p '$PROFILEDIR'" eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$PROFILE$NUM.txt')" elif [ ${URL:(-6)} = "_fasta" ]; then ALLELE=$(echo $URL | awk -F'/' '{print $7}') eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$ALLELE.tfa')"

javiertognarelli commented 3 years ago

Hi, For me the script didn't work. The subfolders were not created and the files not downloaded. I had to change the echo command into an eval command. Any explanations for an old DOS-user?

Here the lines I changed:

eval "mkdir -p '$PROFILEDIR'" eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$PROFILE$NUM.txt')" elif [ ${URL:(-6)} = "_fasta" ]; then ALLELE=$(echo $URL | awk -F'/' '{print $7}') eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' --output-document='$ALLELE.tfa')"

This change works for me!!! Also thank you all of you for this fix!!!!!

javiertognarelli commented 3 years ago

Hi everyone. It looks like pubMLST has changed something again, so now a new fix is needed, because there is two different URLs to download schemes and sequences, so should be like this:

#!/bin/bash

set -e

OUTDIR=pubmlst
mkdir -p "$OUTDIR"
wget --no-clobber -P "$OUTDIR" http://pubmlst.org/data/dbases.xml

for URL in $(grep '<url>' $OUTDIR/dbases.xml); do
#  echo $URL
  URL=${URL//<url>}
  URL=${URL//<\/url>}
#  echo ${URL: -4}
  if [ ${URL:(-4)} = "_csv" ]; then
    #PROFILE=$(basename $URL .txt)
    PROFILE=$(echo $URL | awk -F'_' '{print $2}')
    if [ $(echo $URL | awk -F'/' '{print $3}')  = "rest.pubmlst.org" ]; then
        NUM=$(echo $URL | awk -F'/' '{if($7!=1) print "_"$7}')
    else
        NUM=$(echo $URL | awk -F'/' '{if($8!=1) print "_"$8}')
    fi
    echo "# $PROFILE "
    PROFILEDIR="$OUTDIR/$PROFILE$NUM"
    eval "mkdir -p '$PROFILEDIR'"
    eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$PROFILE$NUM.txt')"
  elif [ ${URL:(-6)} = "_fasta" ]; then
    if [ $(echo $URL | awk -F'/' '{print $3}')  = "rest.pubmlst.org" ]; then
        ALLELE=$(echo $URL | awk -F'/' '{print $7}')
    else
        ALLELE=$(echo $URL | awk -F'/' '{print $8}')
    fi
    eval "(cd '$PROFILEDIR' && echo "$URL" && wget -q '$URL' -O '$ALLELE.tfa')"
  fi
done

# delete fungi schemes
echo rm -frv "$OUTDIR"/{afumigatus,blastocystis,calbicans,cglabrata,ckrusei}
echo rm -frv "$OUTDIR"/{ctropicalis,csinensis,kseptempunctata,sparasitica,tvaginalis}

I hope this could save time and suffering