plazi / lycophron

Batch uploader to Zenodo
Creative Commons Zero v1.0 Universal
2 stars 5 forks source link

upload bat files from sandbox to production #22

Closed myrmoteras closed 9 months ago

myrmoteras commented 1 year ago

Hi Donat and @flsimoes ,

Manuel finished last week the Sandbox upload of the ~230 records bats collection from the Google Sheet Felipe and Juliana shared in the last Arcadia sprint.

Some pending action items before we go ahead with uploading to production:

Let me know if you have any questions, I’ll be off next week on holidays, but we can re-route to the rest of the team so they can help.

Cheers, Alex

PS: I couldn’t find Juliana’s email address, so feel free to forward this or add her in the loop.

myrmoteras commented 1 year ago

https://sandbox.zenodo.org/record/1179813#.ZC2SGXZBz8A

myrmoteras commented 1 year ago

https://sandbox.zenodo.org/record/1179813#.ZC2S03ZBz8B

having tags (e.g. <i> in the title seems to be recurring issue

(Boa constrictor orophias)

remove tags in title
image

myrmoteras commented 1 year ago

@slint can you add the files of this bat upload NOT to the bats_project a new bat to the biosyslit, coviho and the globalbioticinteractions

slint commented 1 year ago

Regarding communities, yes we can add both the biosyslit and coviho communities.

For the GloBi community though, it's Jorrit who is curating it, in which case he will have to accept all the record inclusion requests (we can also do this on our side quickly for all of them if he agrees).

myrmoteras commented 1 year ago

@slint @Donat Agosti please feel free to automatically submit the docs to the globalbioticinteractions zenodo community. We can auto-accept if needed.

myrmoteras commented 1 year ago

via @slint

Regarding Lycophron, I agree we can proceed with pushing to production. There were though a couple of clean-up tasks still (there are some HTML tags in titles, missing DOIs, etc.), that we can’t do on our side without the domain knowledge. We can revive the GitHub issue and ping the right people. Afterwards Manuel can close it up and make the run (and also automatically add it to the GloBi community as well).

@flsimoes can you tell @juwingert to look into this issue?

flsimoes commented 1 year ago

I do not have access to the spreadsheet in question

juwingert commented 1 year ago

I do not have access to the spreadsheet in question

Me neither.

slint commented 1 year ago

I'm sorry, can you try again here: https://docs.google.com/spreadsheets/d/19T3S6kKJyVJpe7lgd5GKYKLfZXMCTZhgo6d0N8EhfIk/edit?usp=sharing

myrmoteras commented 1 year ago

@slint is it just cleaning up the column A with the article title

juwingert commented 1 year ago

@slint @myrmoteras I changed the i tag to em in column A and added some missing DOIs, is that it?

slint commented 1 year ago
flsimoes commented 1 year ago
  • Regarding the HTML tags, they need to be completely removed from the titles, since they're not actually supported, and are rendered as regular text on the record page

Ah ok, I don't recall why, but when Marcus started development on Lycophron he asked us to fill the HTML tags.

juwingert commented 1 year ago
  • Regarding the HTML tags, they need to be completely removed from the titles, since they're not actually supported, and are rendered as regular text on the record page
   OK, I removed them.
  • For the missing DOIs, if there is no known corresponding DOI, then they don't have to be filled-in and we will register a Zenodo DOI for them
   I completed two or three DOI's, only the ones I undoubtedly had.
slint commented 1 year ago

Ah ok, I don't recall why, but when Marcus started development on Lycophron he asked us to fill the HTML tags.

For the description column that is good, since we render the HTML.


Thanks a lot for having a look and fixing the entries @juwingert and @flsimoes, we are ready to upload it on production now. The team is a bit swamped this week, but I think we can have it up by the end of next week. I'll respond here, with links to the records.

myrmoteras commented 1 year ago

@slint very good. Let us know. We are having a second part of the bat publications for this Covid-related bat corroosting project by next week, so we can prepare them for a next upload

Also, please make sure that the upload is to BLR and covihio community, and that we have to admin rights, becuase we eventually will process those files to move them also into biodivPMC at SIBiLS.

tx

myrmoteras commented 1 year ago

For the description column that is good, since we render the HTML.

@slint the reason the emphasis are in the text is because these are taxonomic names. May be later this could be used to format taxonomic names in the title of the deposit?

myrmoteras commented 1 year ago

@slint can we move this now into production, so we have this done? tx

slint commented 1 year ago

@myrmoteras, @alejandromumo has uploaded the records up in production! We've managed to narrow them down using this search query, in case you want to share.

Two things missing:

myrmoteras commented 1 year ago

@alejandromumo @slint thanks for this. Is there a way to get this upload selected and or viewed, that is, give me all the articles of this batch upload? Or would we need to add a special keyword or similar to it?

This is question that will be raised tomorrow afternoon when I will show it to my covid colleagues...

myrmoteras commented 1 year ago

@slint I understand, that we also could add keywords, or custom keywords. Is this right?

myrmoteras commented 1 year ago

another question is, do we have a list of CSV or similar in which we have the Zenodo ID added to the bibliographic records?

slint commented 1 year ago

Is there a way to get this upload selected and or viewed, that is, give me all the articles of this batch upload? Or would we need to add a special keyword or similar to it?

Right now, the search query I shared above was basically "open access articles from 2023-08-01 that are in BLR, COVIHO, and GloBi", which narrows it down to the exact set. But I agree, that's not a very trivial differentiating factor. That's why originally we proposed placing these either in a new Zenodo community (as if they were a specific "collection" of sorts), or maybe marking them with a keyword (or even better if there was a DarwinCore custom field that could describe them all).

@slint I understand, that we also could add keywords, or custom keywords. Is this right?

Yes, all of the Zenodo metadata fields are supported in the input CSV.

another question is, do we have a list of CSV or similar in which we have the Zenodo ID added to the bibliographic records?

I think @alejandromumo keeps the Zenodo ID in the Lycophron database file, so we could make an export.

alejandromumo commented 1 year ago

@myrmoteras @flsimoes @slint

Find below the sheet with the results of the run in prod: link

It is currently private so I will accept your access requests as soon as you send them.

myrmoteras commented 1 year ago

@alejandromumo can we share the link with the COVID task force group to show them? Or may be just make it open for viewing?

alejandromumo commented 1 year ago

@alejandromumo can we share the link with the COVID task force group to show them? Or may be just make it open for viewing?

if it makes it easier to share, I can make it public for view. Is there any particular data they might be interested in? Otherwise, since we'd be making it public, we can show just some public fields in the sheet (e.g. DOI and zenodo URL).

myrmoteras commented 1 year ago

I would just make it public as is for viewing. It is interesting for this group to look at the structure and granularity.

myrmoteras commented 1 year ago

@alejandromumo would it be possible to get the Zenodo deposition ID linked to the original XLS file, which includes the bibibliographic data?

myrmoteras commented 1 year ago

@alejandromumo @slint

we are running a publisher workshop in Carouge on Sept 14/15, where Lars will participate. Is it possible that we could have a version of Lycorphron there, including the write back of the Zenodo id into the original source file, so we can refer to it?

I would bring this to the attention of the audience that authors or publishers could just upload all the articles to BLR so they have all their articles

  1. with a DOI if not available
  2. ready for further processing
  3. processed through TB, either directly through the webhook or via TB and thus have also the data there in FAIR. 3 is essentially the next step of TNA in BiCIKL

Thanks