Closed jhpoelen closed 1 year ago
@jhpoelen can you please explain what this long string means and where the error is?
fixed
@jhpoelen can you please explain what the difference between issues 189-192 is?
@jhpoelen can you please explain what this long string means and where the error is?
in cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b576-597
-
cut:
prefix indicating the last part (e.g., b576-597) contains a specific byte location or range. Compatible with posix/linux cut
command.
zip:
prefix indicated that a zip entry is addressed within the content. In this case, that entry is: treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml
hash:
is followed by the hash type (e.g., sha256) and the hash instance (e.g., 56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197) . The hash, the long hexadecimal sequence, is the unique digital fingerprint of the version of treatments-xml that was retrieved from Github.
In other words, the long string points to the exact location of the text segment that was the subject of this annotation (e.g., Barbastello leucomelas)
sorry, so far we do not deal with code in the QC department. Can you please explain in a human readable format which name is wrong and in which treatment?
In this case, the suspicious name is "Barbastello leucomelas" .
and, the related treatment is C30587A9A562FF93FF2F350F875ED55F .
As far as I can tell, all these information elements are currently present in the title and body of the issues.
@jhpoelen can you please explain what the difference between issues 189-192 is?
The difference between the annotations associated with
https://github.com/plazi/community/issues/189 https://github.com/plazi/community/issues/190 https://github.com/plazi/community/issues/191 https://github.com/plazi/community/issues/192
is the location of the subject in the treatment. In the table below, you'll find that the suffix of the reference ids are different. In this case, they point to different text segments of a specific version of a treatment with UUID C30587A9A562FF93FF2F350F875ED55F .
subjectReferenceId | seeAlso |
---|---|
cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b576-597 |
https://github.com/plazi/community/issues/189 |
cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b3054-3075 |
https://github.com/plazi/community/issues/190 |
cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b3550-3571 |
https://github.com/plazi/community/issues/191 |
cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b3714-3735 |
https://github.com/plazi/community/issues/192 |
So, it appears that the same subject (e.g., "Barbastello leucomelas") occurs four times in the treatment.
fixed
Thanks for addressing the issue.
Can you please point to the location / version of the treatment that includes the fix?
The latest version in the treatment-xml repo, in TB, gbif, etc.
So, it appears that the same subject (e.g., "Barbastello leucomelas") occurs four times in the treatment.
Can you please show me this in the treatment?
The latest version in the treatment-xml repo, in TB, gbif, etc.
Ok, can you please be more specific? I am assuming that latest versions can change in the future.
So, it appears that the same subject (e.g., "Barbastello leucomelas") occurs four times in the treatment. Can you please show me this in the treatment?
Sure!
In both cases, the reference shows that the text segements originate from treatment with uuid C30587A9A562FF93FF2F350F875ED55F as published by github in treatements-xml with content id //sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197 . This version of your corpus can be found at:
Plazi Community. (2022). Plazi Treatments XML Archive hash://sha256/3cfd60b8b19e76d208377537835de92efdb5b945a6a71765b74ed2fe22298b42 hash://md5/594923284e3eb9965b8cbad149c76cd0f (hash://sha256/3cfd60b8b19e76d208377537835de92efdb5b945a6a71765b74ed2fe22298b42) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7443343
as well as a copy on my local hard disk.
you can make your own copy using instructions embedded in https://doi.org/10.5281/zenodo.7443343
preston clone https://zenodo.org/record/7443343/files
with that, you can retrieve the associated treatment (the specific version) via:
preston cat 'zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml'
see also attached C30587A9A562FF93FF2F350F875ED55F.xml.txt
then, you'll find the four occurrences of the text fragment using the suffix of the reference id
cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b576-597
cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b3054-3075
cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b3550-3571
cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b3714-3735
or b576-597 b3054-3075 b3550-3571 b3714-3735
you'll find that all text segments contains instances of "Barbastello leucomelas" .
But perhaps, I should ask the question - what would you expect to use to highlight the annotated text segments? How do you annotate (and reference!) text segments in your annotated treatement texts?
You can retrieve the text segments using:
preston cat 'cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b576-597'
yielding
Barbastello leucomelas
or
preston cat 'zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml' | cut -z -b576-597
I can imagine that some fancy (web) UI can be hacked on top of this to make it visually a little more intuitive.
Thanks for being patient in resolving the annotations reported by @kephelps, @qgroom and you.
Errors are fixed in in TB.
For this, I need to know from you docId
, docVersion
, taxonomicName id
or else, Kendra could ask: in this treatment https://tb.plazi.org/GgServer/html//C30587A9A562FF93FF2F350F875ED55F the name should be Barbastella
thanks for the explanation and showing yet another way to look at and interoperate with our data liberated from publications.
For this, I need to know from you docId, docVersion, taxonomicName id
I believe that the requested information can be found in the header of the associated treatments xml file as retrieved via
preston cat 'zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml'\
| head -n1\
| sed 's/"[ ]/"\n/g'
yielding:
<document ID-DOI="http://doi.org/10.5281/zenodo.7353060"
ID-ISBN="1-56098-217-9"
ID-Zenodo-Dep="7353060"
approvalRequired="4"
approvalRequired_for_document="2"
approvalRequired_for_matCits="1"
approvalRequired_for_originalDoi="1"
checkinTime="1667534087130"
checkinUser="GgServerImporter"
docAuthor="Karl F. Koopman"
docDate="1993"
docId="C30587A9A562FF93FF2F350F875ED55F"
docLanguage="en"
docName="MammalSpeciesofTheWorld.1993.Chiroptera.137-241.pdf.imd"
docOrigin="Mammal Species of the World (2 nd Edition), Washington and London: Smithsonian Institution Press"
docTitle="Barbastello leucomelas"
docType="treatment"
docVersion="3"
lastPageNumber="199"
masterDocId="3F3CFFD1A55CFFADFFDC34768324D72D"
masterDocTitle="Order Chiroptera"
masterLastPageNumber="241"
masterPageNumber="137"
pageNumber="199"
updateTime="1669256093699"
updateUser="ExternalLinkService">
Please confirm that this is the information you need to reference the content that contains the fix proposed by @kephelps .
PS not quite sure what you mean by taxon id . Can you please provide an example?
@myrmoteras @flsimoes after review, I found that there's still two occurrences of the suspicious name "Barbastello leucomelas"
A copy of plazi/treatments-xml was retrieved on 2022-12-14T15:15:28.064Z and contained two instances of the suspicious name on lines 52 and 56:
via
preston cat 'zip:hash://sha256/c37add56f855607de5cbcd9d47d96346454e203559de66a15c30427e1fe172fb!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml'\
| grep -n "Barbastello leucomelas"
52:<treatmentCitation author="Cretzschmar" authority="(Cretzschmar, 1826)" baseAuthorityName="Cretzschmar" baseAuthorityYear="1826" class="Mammalia" family="Vespertilionidae" genus="Barbastello" kingdom="Animalia" order="Chiroptera" page="73" pageId="62" pageNumber="199" phylum="Chordata" rank="species" species="leucomelas" title="Barbastello leucomelas" volumeTitle="In Ruppell, Atlas Reise Nordl. Afr., Zool. Saugeth." year="1826">
58:<bibCitation author="Cretzschmar" pageId="62" pageNumber="199" pagination="73" title="Barbastello leucomelas" volumeTitle="In Ruppell, Atlas Reise Nordl. Afr., Zool. Saugeth." year="1826">
So, I have evidence to suggest that the recommendation by @kephelps is only partially fixed.
Please confirm.
PS Please note that the recommended name did occur twice in the treatment of interest C30587A9A562FF93FF2F350F875ED55F , on line 1 and line 54
preston cat 'zip:hash://sha256/c37add56f855607de5cbcd9d47d96346454e203559de66a15c30427e1fe172fb!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml'\
| grep -n "Barbastella leucomelas"
1:<document ID-DOI="http://doi.org/10.5281/zenodo.7353060" ID-GBIF-Dataset="54c9783b-6624-4034-96d9-d09afe43b319" ID-ISBN="1-56098-217-9" ID-Zenodo-Dep="7353060" approvalRequired="4" approvalRequired_for_document="2" approvalRequired_for_matCits="1" approvalRequired_for_originalDoi="1" checkinTime="1667534087130" checkinUser="GgServerImporter" docAuthor="Karl F. Koopman" docDate="1993" docId="C30587A9A562FF93FF2F350F875ED55F" docLanguage="en" docName="MammalSpeciesofTheWorld.1993.Chiroptera.137-241.pdf.imd" docOrigin="Mammal Species of the World (2 nd Edition), Washington and London: Smithsonian Institution Press" docTitle="Barbastella leucomelas" docType="treatment" docVersion="6" lastPageNumber="199" masterDocId="3F3CFFD1A55CFFADFFDC34768324D72D" masterDocTitle="Order Chiroptera" masterLastPageNumber="241" masterPageNumber="137" pageNumber="199" updateTime="1673356398203" updateUser="valdenar">
54:<emphasis box="[243,606,377,416]" italics="true" pageId="62" pageNumber="199">Barbastella leucomelas</emphasis>
can you please translate this into a format we use to work: please open the html version of the treatment and point us at the two wrong taxonomic names.
@myrmoteras thanks for being patient with me as we are working our way through this tedious annotation adventure.
In reviewing https://tb.plazi.org/GgServer/html//C30587A9A562FF93FF2F350F875ED55F as accessed just now, it appears that the undesired occurrences of "Barbastello leucomelas" are not rendered in the html view onto the treatment. However, in the treatments-xml version, they do appear on lines 52 and 58. So, it appears I cannot point you to the text segment in the html version of the treatment, because the text is not translated into the html view.
Perhaps easier to point to a specific line in a github version at:
line 52 :
and line 58
My question remains -
What kind of specific reference will help us to point exactly to text segments that are subject of the annotation?
I can confirm that this specific occurrence of Barbastello leucomelas was updated as suggested @kephelps.
For more information see https://github.com/jhpoelen/msw-plazi/commit/905e30326b6123c09e867f8752dea3d065d63e37 .
on 2022-12-14 , Kendra claimed that [ Barbastello leucomelas ] should be replace with [ Barbastella leucomelas ] in [ cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/C3/05/87/C30587A9A562FF93FF2F350F875ED55F.xml!/b576-597 ]