rdpstaff / RDPTools

Collection of commonly used RDP Tools for easy building
49 stars 52 forks source link

rdpclassifiertraindata download timeouts and breaks the build #10

Open EricDeveaud opened 8 years ago

EricDeveaud commented 8 years ago

hello,

while trying to build a docker image for RDPTools I have a problem with the donload of the classifier training set that timesout
see:

download-traindata:
      [get] Getting: http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
      [get] To: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
    [untar] Expanding: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz into /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes

BUILD FAILED
/local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build.xml:112: Error while expanding /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
java.io.EOFException: Unexpected end of ZLIB input stream
    at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)

wget of the same url gives:

wget http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
--2016-02-25 11:43:38--  http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Resolving rdp.cme.msu.edu... 35.8.164.79
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 149530230 (143M) [application/x-gzip]
Saving to: 'data.tgz'

61% [================================>                     ] 91,435,408   255KB/s   in 3m 30s 

2016-02-25 11:47:08 (425 KB/s) - Connection closed at byte 91435408. Retrying.

seems to me that the the get method used does not honour timeout nor the retry

best regards

Eric

rdpstaffmsu commented 8 years ago

Hi, Eric,

We tried and were not able to replicate this problem using computers in locations. We will look into any adjustments that might remedy this situation. For now, would you mind downloading this file and add to the folder if this problem persists? Thank you.

Benli

On Thu, Feb 25, 2016 at 5:55 AM, Eric Deveaud notifications@github.com wrote:

hello,

while trying to build a docker image for RDPTools I have a problem with the donload of the classifier training set that timesout

see:

download-traindata: [get] Getting: http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz [get] To: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz [untar] Expanding: /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz into /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes

BUILD FAILED /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build.xml:112: Error while expanding /local/gensoft2/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)

wget of the same url gives:

wget http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz --2016-02-25 http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz--2016-02-25 11:43:38-- http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz Resolving rdp.cme.msu.edu... 35.8.164.79 Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 149530230 (143M) [application/x-gzip] Saving to: 'data.tgz'

61% [================================> ] 91,435,408 255KB/s in 3m 30s

2016-02-25 11:47:08 (425 KB/s) - Connection closed at byte 91435408. Retrying.

seems to me that the the get method used does not honour timeout nor the retry

best regards

Eric

— Reply to this email directly or view it on GitHub https://github.com/rdpstaff/RDPTools/issues/10.

RDP Staff Ribosomal Database Project Center for Microbial Ecology Michigan State University 567 Wilson Rd. Room 2225 A East Lansing, MI 48824 (517) 353-3842

EricDeveaud commented 8 years ago

currently I was abble to build using the following

get externaly the data.tgz (wget) host data.tgz in localhost web server and patch classifier/build.xml to use localhost instead of rdp.cme.msu.edu

sed -i -e 's,http://rdp.cme.msu.edu/download,http://localhost|'  classifier/build.xml

it's more or less what you suggested.

2 suggestion to fix the build process 1) (hard way) check the get method used while building in order to see if it can handle timeouts 2) (easy way) what you suggested. remove training data download from the build process and document that user must download the files by their own.

regards

Eric

rdpstaffmsu commented 8 years ago

Hi, Eric,

Thank you for the suggestions. We will look into the options to get it fixed.

Benli

On Sat, Feb 27, 2016 at 6:22 AM, Eric Deveaud notifications@github.com wrote:

currently I was abble to build using the following

get externaly the data.tgz (wget) host data.tgz in localhost web server and patch classifier/build.xml to use localhost instead of rdp.cme.msu.edu

sed -i -e 's,http://rdp.cme.msu.edu/download,http://localhost|' classifier/build.xml

it's more or less what you suggested.

2 suggestion to fix the build process 1) (hard way) check the get method used while building in order to see if it can handle timeouts 2) (easy way) what you suggested. remove training data download from the build process and document that user must download the files by their own.

regards

Eric

— Reply to this email directly or view it on GitHub https://github.com/rdpstaff/RDPTools/issues/10#issuecomment-189619856.

RDP Staff Ribosomal Database Project Center for Microbial Ecology Michigan State University 567 Wilson Rd. Room 2225 A East Lansing, MI 48824 (517) 353-3842

EricDeveaud commented 8 years ago

back at this.

I had to make a fresh install RDPtools. here is some output from wget

make[1]: Entering directory `/inst/RDPTools/RDPTools-2.0.2'
# java builder//installer tries to download data file and timeout
# donwload externaly
test -d /src/RDPTools/RDPTools-2.0.2/classifier/build/classes || mkdir -m 2775  -p /src/RDPTools/RDPTools-2.0.2/classifier/build/classes
test -f /src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz || \
        wget --tries=5 -c  http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz -O /src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz
--2016-07-07 18:02:02--  http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Resolving rdp.cme.msu.edu... 35.8.164.79
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2016-07-07 18:03:33--  (try: 2)  http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 181332714 (173M) [application/x-gzip]
Saving to: `/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz'

 0% [                                                                                        ] 302,632      101K/s   in 47s     

2016-07-07 18:04:22 (6.26 KB/s) - Connection closed at byte 302632. Retrying.

--2016-07-07 18:04:24--  (try: 3)  http://rdp.cme.msu.edu/download/rdpclassifiertraindata/data.tgz
Connecting to rdp.cme.msu.edu|35.8.164.79|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 181332714 (173M), 181030082 (173M) remaining [application/x-gzip]
Saving to: `/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz'

100%[=======================================================================================>] 181,332,714 5.31M/s   in 36s     

2016-07-07 18:05:00 (4.81 MB/s) - `/src/RDPTools/RDPTools-2.0.2/classifier/build/classes/data.tgz' saved [181332714/181332714]
davidvilanova commented 4 years ago

Cannot download the traindata either. Can you copy the traindata somewhere or fix the URL ??

cebercoto commented 2 months ago

Same problem here as of 14/05/2024