waldronlab / BugSigDB

A microbial signatures database
https://bugsigdb.org
7 stars 6 forks source link

Incomplete CSV downloads #234

Closed jwokaty closed 2 months ago

jwokaty commented 3 months ago

I noticed that if I try to download the csv files from https://bugsigdb.org/Help:Export that I get varying sizes. I've tried this in my browser as well as using wget and curl. I've also asked another person on another network who confirmed that they also received files of different sizes. The resulting file, such as with the studies, may not contain all the studies in bugsigdb even though the number of studies is currently under the limit (5000) specified in the configuration for the csv. Perhaps more time should be allowed to download the files?

fm@hal4:~/Work/WaldronLab/BugSigDbExports$ curl -L -O "https://bugsigdb.org/Special:Ask/-5B-5BCategory:Studies-5D-5D/-3FStudy-20design/-3FPMID/-3FDOI/-3FURL/-3FAuthors-20list/-3FTitle/-3FJournal/-3FYear/-3FAbstract/-3FKeyword-20list%3DKeywords/-3FState/-3FReviewer/mainlabel%3DStudy-20page-20name/limit%3D5000/order%3Dasc/sort%3DPage-20sort-20number/offset%3D0/format%3Dcsv/searchlabel%3DDownload-20all-20Studies-20(CSV)/filename%3Dstudies.csv"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1680k    0 1680k    0     0   150k      0 --:--:--  0:00:11 --:--:--  433k
fm@hal4:~/Work/WaldronLab/BugSigDbExports$ curl -L -O "https://bugsigdb.org/Special:Ask/-5B-5BCategory:Studies-5D-5D/-3FStudy-20design/-3FPMID/-3FDOI/-3FURL/-3FAuthors-20list/-3FTitle/-3FJournal/-3FYear/-3FAbstract/-3FKeyword-20list%3DKeywords/-3FState/-3FReviewer/mainlabel%3DStudy-20page-20name/limit%3D5000/order%3Dasc/sort%3DPage-20sort-20number/offset%3D0/format%3Dcsv/searchlabel%3DDownload-20all-20Studies-20(CSV)/filename%3Dstudies.csv"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2343k    0 2343k    0     0   147k      0 --:--:--  0:00:15 --:--:--  641k
fm@hal4:~/Work/WaldronLab/BugSigDbExports$ curl -L -O "https://bugsigdb.org/Special:Ask/-5B-5BCategory:Studies-5D-5D/-3FStudy-20design/-3FPMID/-3FDOI/-3FURL/-3FAuthors-20list/-3FTitle/-3FJournal/-3FYear/-3FAbstract/-3FKeyword-20list%3DKeywords/-3FState/-3FReviewer/mainlabel%3DStudy-20page-20name/limit%3D5000/order%3Dasc/sort%3DPage-20sort-20number/offset%3D0/format%3Dcsv/searchlabel%3DDownload-20all-20Studies-20(CSV)/filename%3Dstudies.csv"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2343k  100 2343k    0     0  2115k      0  0:00:01  0:00:01 --:--:-- 2115k
fm@hal4:~/Work/WaldronLab/BugSigDbExports$ curl -L -O "https://bugsigdb.org/Special:Ask/-5B-5BCategory:Studies-5D-5D/-3FStudy-20design/-3FPMID/-3FDOI/-3FURL/-3FAuthors-20list/-3FTitle/-3FJournal/-3FYear/-3FAbstract/-3FKeyword-20list%3DKeywords/-3FState/-3FReviewer/mainlabel%3DStudy-20page-20name/limit%3D5000/order%3Dasc/sort%3DPage-20sort-20number/offset%3D0/format%3Dcsv/searchlabel%3DDownload-20all-20Studies-20(CSV)/filename%3Dstudies.csv"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2343k  100 2343k    0     0  2362k      0 --:--:-- --:--:-- --:--:-- 2362k
fm@hal4:~/Work/WaldronLab/BugSigDbExports$ curl -L -O "https://bugsigdb.org/Special:Ask/-5B-5BCategory:Studies-5D-5D/-3FStudy-20design/-3FPMID/-3FDOI/-3FURL/-3FAuthors-20list/-3FTitle/-3FJournal/-3FYear/-3FAbstract/-3FKeyword-20list%3DKeywords/-3FState/-3FReviewer/mainlabel%3DStudy-20page-20name/limit%3D5000/order%3Dasc/sort%3DPage-20sort-20number/offset%3D0/format%3Dcsv/searchlabel%3DDownload-20all-20Studies-20(CSV)/filename%3Dstudies.csv"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2343k  100 2343k    0     0  2351k      0 --:--:-- --:--:-- --:--:-- 2350k
fm@hal4:~/Work/WaldronLab/BugSigDbExports$ curl -L -O "https://bugsigdb.org/Special:Ask/-5B-5BCategory:Studies-5D-5D/-3FStudy-20design/-3FPMID/-3FDOI/-3FURL/-3FAuthors-20list/-3FTitle/-3FJournal/-3FYear/-3FAbstract/-3FKeyword-20list%3DKeywords/-3FState/-3FReviewer/mainlabel%3DStudy-20page-20name/limit%3D5000/order%3Dasc/sort%3DPage-20sort-20number/offset%3D0/format%3Dcsv/searchlabel%3DDownload-20all-20Studies-20(CSV)/filename%3Dstudies.csv"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1280k    0 1280k    0     0  95657      0 --:--:--  0:00:13 --:--:--  375k
fm@hal4:~/Work/WaldronLab/BugSigDbExports$ curl -L -O "https://bugsigdb.org/Special:Ask/-5B-5BCategory:Studies-5D-5D/-3FStudy-20design/-3FPMID/-3FDOI/-3FURL/-3FAuthors-20list/-3FTitle/-3FJournal/-3FYear/-3FAbstract/-3FKeyword-20list%3DKeywords/-3FState/-3FReviewer/mainlabel%3DStudy-20page-20name/limit%3D5000/order%3Dasc/sort%3DPage-20sort-20number/offset%3D0/format%3Dcsv/searchlabel%3DDownload-20all-20Studies-20(CSV)/filename%3Dstudies.csv"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1600k    0 1600k    0     0   141k      0 --:--:--  0:00:11 --:--:--  395k
tosfos commented 3 months ago

There is a data rebuild in progress. Please check in around 10 hours and see if this is improved. Actually let me verify this.

jwokaty commented 2 months ago

@tosfos Thanks for verifying. I tried also yesterday and today downloading via R on one of our servers, which is more powerful than my laptop. I get better results, such as generally getting all the studies, but I still don't seem to be getting all the records I expect for signatures, which should have 5k+ rows but I get around 2K+.

tosfos commented 2 months ago

Closing as dup. Continuing in https://github.com/waldronlab/BugSigDB/issues/221

tosfos commented 2 months ago

Noting that a download today contained the correct 5,333 records