Closed katrinleinweber closed 4 years ago
thanks for the report @katrinleinweber - having a look
Thank you :-)
Shortly afterwards, I also started seeing: Error in solr_error(res) : (504) Gateway Timeout - The gateway server did not receive a timely response
, and colleagues of mine also get the 504 error on the web-interface they use.
Nothing is written about this on https://status.datacite.org/ as of now, but I hear that a database upgrade/migration is being conducted. I presume the 504 error is related to that, not to the above-described GBIF issue.
cc @mfenner
looks like the xml field in the output can be very very large, causing some issues
so the error is on Datacite side, not GBIF, correct?
Depends ;-) If GBIF exceeded some limit when submitting that "meta"data, one could argue it was an error on their side. I'm seeing that error when using rdatacite::dc_works()
, though, so the download is coming form DataCite.org in that moment.
@katrinleinweber i couldn't replicate that exact error you had, but I did get an error I think is related to your problem, anyway, i added a parameter discard_xml
in dc_works
to delete the xml field before returning it to the console. the problem I think is that the very long base64 encoded xml string attempts to be printed by the R base method print.data.frame
, and apparently there is some limit on how long a string can be for that method.
A way to make the data.frame output more readable is e.g.,
z <- dc_works("prefix:10.15468", rows = 15)
z$data <- tibble::as_tibble(z$data)
z
also, the max rows
setting I think is 1000, added that to the docs
discard_xml
Thank you :-)
[...] max rows setting I think is 1000, added that to the docs
I bisected my way to 99999L
. Seemed to work. 100000L
& higher resulted in 403 Forbidden
errors.
weird, 403 is a authentication error, hmmm
@katrinleinweber and @sckott we retired our Solr service last Thursday, completing the transition to Elasticsearch. The Solr API that rdatacite
is using was officially retired in January 2019, and we made multiple announcements in the past.
@sckott let me know if you need help transition to the DataCite REST API.
thanks @mfenner - will do
this fxn is now gone in the refactor
branch - closing
I'm running into a problem when downloading GBIF's metadata records:
I'm guessing that's because they submitted a very large file encoded in their JSON/XML upload to DataCite. Is there a more elegant way of finding out which DOI is the problematic one, than:
rows
parameter combined with a given order,offset = row+1
Session Info
```r ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.6.2 (2019-12-12) os macOS Catalina 10.15.2 system x86_64, darwin15.6.0 ui RStudio language en collate en_US.UTF-8 ctype en_US.UTF-8 tz Europe/Copenhagen date 2019-12-15 ─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────── ! package * version date lib source assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0) callr 3.4.0 2019-12-09 [1] CRAN (R 3.6.0) cli 2.0.0 2019-12-09 [1] CRAN (R 3.6.0) colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) crul 0.9.0 2019-11-06 [1] CRAN (R 3.6.0) curl 4.3 2019-12-02 [1] CRAN (R 3.6.0) desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) devtools * 2.2.1 2019-09-24 [1] CRAN (R 3.6.0) digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.0) dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.6.0) ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0) fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0) R fd * 0.1.0