plazi / treatmentBank

Repository devoted to house keeping of treatmentBank
0 stars 0 forks source link

removing of MC from TB in GBIF? #95

Open myrmoteras opened 1 year ago

myrmoteras commented 1 year ago

@gsautter in this match, it get this return for the material citation key in GBIF: image

is this a unique issue, or does this occur whenever we change a MC in a treatment?

Also in this match

gsautter commented 1 year ago

Well, it happens if the occurrenceID in the DwCA changes, which is composed of the treatment UUID and the materials citation UUID ... and both these UUIDs are bound to the positions of their start and end words, so if you change annotation boundaries, that does change the respective UUID ... the downside of well-defined UUIDs, with the advantage being reproducability, and thus duplicate prevention, which was a problem in the past, and to some degree still is with our XML-only documents.

However, looking at the version history of the underlying treatment, it doesn't look like any annotation boundaries changed at all, since external link write-back doesn't do that, only ever setting attributes: image

It looks more like the occurrences got filtered from the DwCA as the result of some QC reasons: https://tb.plazi.org/GgServer/pdsStats/stats?outputFields=doc.docUuid+doc.name+doc.doi+doc.uploadUser+doc.uploadDateTime+doc.updateUser+doc.updateDateTime+docTransits.detailId+docTransits.detailLabel+docTransits.source+docTransits.dest+docTransits.result+docTransits.probCount&groupingFields=doc.docUuid+doc.name+doc.doi+doc.uploadUser+doc.uploadDateTime+doc.updateUser+doc.updateDateTime+docTransits.detailId+docTransits.detailLabel+docTransits.source+docTransits.dest+docTransits.result+docTransits.probCount&FP-doc.docUuid=FF9EFFD42F71FFCD8259E753C4769279&FP-docTransits.dest=%22DwCA%25%22&format=HTML

A look at https://tb.plazi.org/GgServer/xml/03A787AC2F64FFD882D1E3DDC0989A71 also confirms that the materials citation in question is there and is fine ... it only got filtered from the DwCA, which will be reverted soon as the materials citation issues listed in the error protocol are dealt with (fixed or marked as false positives) ... there's only 4 of them, so this shouldn't take long.

myrmoteras commented 1 year ago

@flsimoes can you freeze x the issues and let me know. I then go back to the respective matcit and see whether I can link them. I already decided they don't match...

gsautter commented 1 year ago

Turns out the underlying article is a Phytotaxa whose treatments have somewhat chaotic structure, and the materials citations in question were ones marked in treatment citations, not in regular "materials examined" sections ... need to properly QC those as well.

flsimoes commented 1 year ago

@flsimoes can you freeze x the issues and let me know. I then go back to the respective matcit and see whether I can link them. I already decided they don't match...

Will work on it. I'm guessing you mean "fix the issues"

gsautter commented 1 year ago

Turns out the underlying article is a Phytotaxa whose treatments have somewhat chaotic structure, and the materials citations in question were ones marked in treatment citations, not in regular "materials examined" sections ... need to properly QC those as well.

IMF fixed, the occurrences are visible in GBIF again, with their original keys.