iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.odis.org/
31 stars 18 forks source link

Harvest Pensoft Journals #460

Open teodorgeorgiev opened 2 months ago

teodorgeorgiev commented 2 months ago

Sitemap links:

https://bdj.pensoft.net/sitemap/marine-articles-index.xml https://zookeys.pensoft.net/sitemap/marine-articles-index.xml https://phytokeys.pensoft.net/sitemap/marine-articles-index.xml https://biorisk.pensoft.net/sitemap/marine-articles-index.xml https://neobiota.pensoft.net/sitemap/marine-articles-index.xml https://natureconservation.pensoft.net/sitemap/marine-articles-index.xml https://zse.pensoft.net/sitemap/marine-articles-index.xml https://nl.pensoft.net/sitemap/marine-articles-index.xml https://riojournal.com/sitemap/marine-articles-index.xml https://italianbotanist.pensoft.net/sitemap/marine-articles-index.xml https://rethinkingecology.pensoft.net/sitemap/marine-articles-index.xml https://mbmg.pensoft.net/sitemap/marine-articles-index.xml https://biss.pensoft.net/sitemap/marine-articles-index.xml https://jor.pensoft.net/sitemap/marine-articles-index.xml https://travaux.pensoft.net/sitemap/marine-articles-index.xml https://neotropical.pensoft.net/sitemap/marine-articles-index.xml https://herpetozoa.pensoft.net/sitemap/marine-articles-index.xml https://vcs.pensoft.net/sitemap/marine-articles-index.xml https://plecevo.eu/sitemap/marine-articles-index.xml https://aquaticinvasions.arphahub.com/sitemap/marine-articles-index.xml

JSON-LD is embedded in each article (i.e. https://bdj.pensoft.net/article/128431).

pbuttigieg commented 2 months ago

@teodorgeorgiev many thanks!

Notes to ODIS team

@jmckenna @fils - this is a link to Pensoft journals, who have created marine subsets of their assets for ODIS.

@jmckenna - you'll note they use the @ScholarlyArticle type, which is a subtype of @Article. This would need a new facet on the frontend and likely some SOLR config if that hasn't been automated yet.

First-pass review of content

I'm taking the first example to review: https://bdj.pensoft.net/article/969

In the Schema.org validator, we get: https://validator.schema.org/#url=https%3A%2F%2Fbdj.pensoft.net%2Farticle%2F969

Some comments:

Escaped characters

Like "@context": "https:\/\/schema.org", and \n should be removed. These will cause compatibility issues downstream.

@id

The @id of the whole document could be the URL to the landing page with the JSON-LD embedded in it. That would let parsers like gleaner know where to get the JSON-LD.

URL

    "url": "https:\/\/bdj.pensoft.net\/",

Would rather be the URL to the landing page of the paper. Same as your mainEntityOfPage value.

Keywords and newlines

    "keywords": "\n  Lapland, faunistics, mayflies, aapamires, ponds,\n",

This should look like:

    "keywords": [ "Lapland", "faunistics", "mayflies", "aapamires", "ponds"]

Semantically qualified identifiers

There's noting "wrong" with this:

    "identifier": {
        "@type": "PropertyValue",
        "@propertyID": "DOI",
        "value": "10.3897\/BDJ.1.e969"
    },

but it would be better as:

    "identifier": {
        "@type": "PropertyValue",
        "propertyID": "https://registry.identifiers.org/registry/doi",
        "url": "https://doi.org/10.3897/BDJ.1.e969",
        "value": "10.3897/BDJ.1.e969"
    }

Content location

@jmckenna @fils - this is the superproperty of what we usually harvest spatial data from (i.e. spatialCoverage): "The location depicted or described in the content. For example, the location in a photograph or painting."

This is valid, and the ODIS stack may need to make some tweaks to accommodate all subproperties of the expected properties in its processing.

Price

@teodorgeorgiev you may wish to add the priceCurrency property to stanzas like this one.

    "offers": {
        "@type": "Offer",
        "price": "5.10",
        "availability": "https:\/\/schema.org\/InStock",
        "description": "Order Printed version"
    },

Pensoft as an Org

I would add some more information about Pensoft in these stanzas, like the website, address, etc.

@teodorgeorgiev it may be worth 1) creating a Organization-typed JSON-LD doc for Pensoft and 2) using the @id JSON-LD keyword to link to it via a PID (DOI, W3ID, etc). This way, you can abbreviate:

    "publisher": {
        "@type": "Organization",
        "name": "Pensoft Publishers",
        "logo": {
            "@type": "ImageObject",
            "url": "https:\/\/pensoft.net\/new_images\/pensoft_logo.svg"
        }
    },

TO

    "publisher": {
        "@id": "https://pid-of-choice/pid-of-Pensoft-JSON-LD-Doc"
    },

You can do that for any repetitive elements if you are sure to issue persistent identifiers that are stable. If that's not the case, it's fine to embed the information in each JSON-LD doc as you've done.

pbuttigieg commented 2 months ago

@teodorgeorgiev

PS: if the articles mention or link out to sources like OBIS, GBIF, INSDC, or others, you can add identifiers of those records using the citation property.

You may also want to add the creditText property to your records, which would contain the recommended citation text of the asset described by the JSON-LD. e.g.

"creditText": "Salmela J, Savolainen E (2013) New records of Paraleptophlebia werneri Ulmer, 1920 and P. strandii (Eaton, 1901) from Finland (Ephemeroptera, Leptophlebiidae). Biodiversity Data Journal 1: e969. https://doi.org/10.3897/BDJ.1.e969"
teodorgeorgiev commented 2 months ago

@pbuttigieg I have resolved your comments except the one about the Organization. I prefer to leave it as it is.

pbuttigieg commented 2 months ago

Thanks @teodorgeorgiev

Perhaps the last optimisation is to create a sitemap index (itself a sitemap that points to other sitemaps) so you can then maintain a single ODISCat entry for Pensoft Journals.

This sitemap index would include all the links in your original post and can be changed your side with low to no additional steps IODE-side

Here's an example from Pacific Data Hub

https://pacificdata.org/organization/sitemap.xml

teodorgeorgiev commented 2 months ago

@pbuttigieg Sure, here it is: https://pensoft.net/marine-sitemap.xml

pbuttigieg commented 2 months ago

Thanks @teodorgeorgiev - please create (if you haven't) the OceanExpert (for yourself and Pensoft, the latter which you can use as an identifier value in your Organization stanzas for Pensoft) and ODISCat entries to initiate the harvest, as described in: https://book.odis.org/gettingStarted.html

This was an exemplary implementation path @jmckenna - to be used for training / coaching in future

teodorgeorgiev commented 2 months ago

@pbuttigieg just did:

https://oceanexpert.org/expert/71704 https://oceanexpert.org/institution/24685

I have added this identifier to the "Organization", however, I would like also to embed the information in each JSON-LD - for consistency (the journals can have different publisher than Pensoft).

pbuttigieg commented 2 months ago

@pbuttigieg just did:

https://oceanexpert.org/expert/71704 https://oceanexpert.org/institution/24685

Great - give the validation process a couple of days and then you can use those to log in to ODISCat and register something like "Pensoft Journals - marine content source"

I have added this identifier to the "Organization", however, I would like also to embed the information in each JSON-LD - for consistency (the journals can have different publisher than Pensoft).

Yes, that makes sense. The two can reinforce one another.

teodorgeorgiev commented 2 months ago

@pbuttigieg please check https://catalogue.odis.org/view/3312 Once it is live, our PR officer Iva Boyadzhieva (i.boyadzhieva@pensoft.net) will prepare a press release about this integration. It would be great if we could coordinate this with your PR department. Do you know who we should contact about it?

pbuttigieg commented 2 months ago

@teodorgeorgiev I'll link you to our team via email - we have some press materials and can coordinate announcements