plazi / arcadia-project

2 stars 1 forks source link

batch uploading of publications to BLR #239

Open myrmoteras opened 11 months ago

myrmoteras commented 11 months ago

We need to find a solution to batch upload publications to BLR, and to be able to edit published articles in BLR

  1. Articles with clean metadata and check, that the article is not in BLR and has not already a DOI
    Input: PDFs with a table including metadata see eg DOI_RPaleo_40_1_2021.xlsx

marcus used Lycophron, @gsautter is using his own to upload articles to BLR.

who can be in charge of this?

gsautter commented 11 months ago

We need to find a solution to batch upload publications to BLR, and to be able to edit published articles in BLR

1. Articles with clean metadata and check, that the article is not in BLR and has not already a DOI
   Input: PDFs with a table including metadata see eg
   [DOI_RPaleo_40_1_2021.xlsx](https://github.com/plazi/arcadia-project/files/13649124/DOI_RPaleo_40_1_2021.xlsx)

marcus used Lycophron, @gsautter is using his own to upload articles to BLR.

* https://github.com/plazi/lycophron

* https://github.com/plazi/O3RT

* https://github.com/plazi/drosophilid-upload which is used to upload https://github.com/plazi/drosophilid-data

who can be in charge of this?

As to the server, that sure is my purview, but what needs changing there? The whole thing is event driven and forwards both new uploads and later updates as they occur, so that in conjunction with QC making sure of the DOI part should tick the boxes for now ...

As to Lycophron, since it's written in Python, which I'm not really familiar with, I cannot say.

myrmoteras commented 11 months ago

we need to be able to upload batches of publications, such as the Drosophild, bats, IUCN SSC batch.

They are not uploaded intititially through GGI, unless we have templates, and they are often scanned, without or with bad OCR, etc.

gsautter commented 11 months ago

we need to be able to upload batches of publications, such as the Drosophild, bats, IUCN SSC batch.

They are not uploaded intititially through GGI, unless we have templates, and they are often scanned, without or with bad OCR, etc.

I'm well aware of that, and that Lycophron is intended for exactly that ... all I tried to state above is that due to it being written in Python, I cannot really take care of developing that.

As to "uploading through GGI", it actually only is a specific component in the TreatmentBank back-end that does the upload, not GGI proper ... and the API interaction relies to a good bit on server infrastructure provided by other components ... pretty hard to make into a standalone tool, especially when it comes to duplicate prevention, which requires a central database ...