plazi / treatments-xml

1 stars 0 forks source link

add material citation UUID to treatment export #11

Closed myrmoteras closed 9 months ago

myrmoteras commented 1 year ago

@gsautter can you please add materialCitation UUID to the XML export to treatments-xml.

This is relevant for solving this use case https://github.com/plazi/names_LOD/issues/133

myrmoteras commented 1 year ago

@retog @nleanba if this is necessary to get the UUID, all the treatments need to be pushed again-

It will double the data volume in Github?!

Worthwhile?

myrmoteras commented 1 year ago

the PURL for material citation is https://tb.plazi.org/GgServer/dwcaRecords/03E96421FFE05F47BF1BFC24FA76CC28.mc.3B28DF6AFFE05F48BF85FBECFD49CEDF

myrmoteras commented 1 year ago

@retog if you insert URLs like this, then you need to tell Guido to allow to resolve the mc in Frankfurt and it will resolve to mat-cit.plazi.org

nleanba commented 1 year ago

I don‘t think this will actually double the amount of data Github stores, as I expect them to make use of (some) de-duplication / diffs.

nleanba commented 1 year ago

once the uuid are here, the URI for a material-citation is:

https://tb.plazi.org/GgServer/dwcaRecords/{treatment uuid}.mc.{material citation uuid}

(clickable links (synospecies) can/should link to https://treatment.plazi.org/id/{treatment uuid}#{material citation uuid})

~TODO @​nleanba handle this in gg2rdf XSLT~ https://github.com/plazi/gg2rdf/releases/tag/v1.5 implements these resource urls (if id is present; fallback to current frankenstein-urls otherwise)

nleanba commented 1 year ago

@gsautter

It would be much appreciated if you reconfigure the export to treatments-xml (this repo) as to include the UUIDs of the MaterialsCitations for all future exports.

Preferrably without a full re-export, as to not overload the transformation.

myrmoteras commented 1 year ago

@gsautter can we finish this issue - may be add this to the batch? For now it would be helpful to make this for new treatments, as well as for the Flora der Schweiz: https://tb.plazi.org/GgServer/dioStats/stats?outputFields=doc.articleUuid+bib.author+bib.title&groupingFields=doc.articleUuid+bib.author+bib.title&FP-bib.author=%22%25Hirzel%25%22&FP-bib.title=%22Flora%20der%20Schweiz%25%22&format=HTML or in JSON

tx

gsautter commented 1 year ago

The exporter is adjusted to export the annotation UUIDs, and a re-export for "Flora der Schweiz" is running now ... let's see how much havoc this mass update wreaks to the Git repo.

gsautter commented 1 year ago

Seems to be working, see for example https://github.com/plazi/treatments-xml/blob/main/data/00/69/E4/0069E4811D69FE49F29FFB1B2DB3B2C0.xml

gsautter commented 1 year ago

Which other treatments (apart from "Flora der Schweiz") should be force-updated?

retog commented 1 year ago

Which other treatments (apart from "Flora der Schweiz") should be force-updated?

I think all treatments where the XML used to be incomplete. @nleanba ?

myrmoteras commented 1 year ago

HBMW https://tb.plazi.org/GgServer/dioStats/stats?outputFields=doc.articleUuid+bib.author+bib.title+bib.source+bib.volume&groupingFields=doc.articleUuid+bib.author+bib.title+bib.source+bib.volume&FP-bib.source=%22%25Handbook%20of%20the%20Ma%25%22&format=HTML or https://tb.plazi.org/GgServer/dioStats/stats?outputFields=doc.articleUuid+bib.author+bib.title+bib.source+bib.volume&groupingFields=doc.articleUuid+bib.author+bib.title+bib.source+bib.volume&FP-bib.source=%22%25Handbook%20of%20the%20Ma%25%22&format=CSV&separator=%2C

myrmoteras commented 1 year ago

https://tb.plazi.org/GgServer/html/8B88FAA8AA092E66D4A43721811A6260

https://tb.plazi.org/GgServer/dioStats/stats?outputFields=doc.articleUuid+bib.author+bib.title+bib.source+bib.volume&groupingFields=doc.articleUuid+bib.author+bib.title+bib.source+bib.volume&FP-bib.source=%22%25ammal%20Species%25%22&format=HTML or https://tb.plazi.org/GgServer/dioStats/stats?outputFields=doc.articleUuid+bib.author+bib.title+bib.source+bib.volume&groupingFields=doc.articleUuid+bib.author+bib.title+bib.source+bib.volume&FP-bib.source=%22%25ammal%20Species%25%22&format=CSV&separator=%2C

nleanba commented 9 months ago

As far as I can tell, this is done(?)