plazi / arcadia-project

2 stars 1 forks source link

Meeeting: Sofia QC, Template, MC-markup #58

Open myrmoteras opened 5 years ago

myrmoteras commented 5 years ago

Organize in July a meeting in Sofia (Guido visiting Sofia) with the following goals:

gsautter commented 5 years ago

I just checked flights, and there is next to no difference between the weeks of Jul 8 and Jul 15, and most likely the same for the week of Jul 22, so we should see to it that we navigate around any holidays of Pensoft people ... @teodorgeorgiev let me know which week suites you guys best, and then I'll book asap before prices go up.

teodorgeorgiev commented 5 years ago

@gsautter all of us (Teodor, Veselin, Angel) will be here at 8-12 Jul. Please select any two days during the week that are comfortable for you.

gsautter commented 5 years ago

I think Wednesday and Thursday would make the most sense. I could arrive Tuesday night and depart Friday afternoon, which will give us two full days of working, plus Friday morning if required. The additional two nights are way less expensive than flying in early (which wouldn't be before noon anyway) and leaving late at night.

teodorgeorgiev commented 5 years ago

Absolutely! Let's do it. Please book your hotel as soon as possible (https://www.booking.com/hotel/bg/silver-house.en-gb.html). Please let me know if you need any assistance with hotel.

gsautter commented 5 years ago

OK, all set, got my flight and hotel. Had to divert to the Suites, though, as Silver House was already booked. See you next week.

teodorgeorgiev commented 5 years ago

See you soon Guido! T.

myrmoteras commented 5 years ago

@teodorgeorgiev @gsautter can you please discuss this, and may be find a solution for it too? This would have an effect since if we have the number included, it would show up in GBIF as well.

Hi Donat Hope you are enjoying summer! I saw some mycological treatments are coming in, and then I noticed that the MycoBank numbers are actually absent from the treaments. E.g. this one http://treatment.plazi.org/id/227FFB707E611951B2A89ACBCA1E07D6 Is missing the MycoBank numbers given in: https://mycokeys.pensoft.net/article/35857/

I think this would be a quite important improvement, since MycoBank is the Nomenclator for for mycology, and thereby providing the “official” id for the taxon name.

All the best Thomas Stjernegaard Jeppesen

myrmoteras commented 5 years ago

notes: https://docs.google.com/document/d/1i_YAwtbC842aL1eEbLME5U4Ua7riyGwTCr3XGVN5AqI/edit?ts=5d25aec4

myrmoteras commented 5 years ago

@teodorgeorgiev @gsautter how is the progress in Sofia? What works, what not?

gsautter commented 5 years ago

Progress is really good, actually:

See also https://docs.google.com/document/d/1i_YAwtbC842aL1eEbLME5U4Ua7riyGwTCr3XGVN5AqI/edit?ts=5d25aec4#

lyubomirpenev commented 5 years ago

It is good news, nice to hear that! I hope you did have also some beers together?

teodorgeorgiev commented 5 years ago

@gsautter: here is a list with all the article which has treatments (217 of 265): https://docs.google.com/spreadsheets/d/1C5spD2xrBqffjvT3oiEXOas-V17qtQmI46h08i6Gxis/edit#gid=0 Could you please run batch server processing (including type materials) only on those 217 PDFs. Ves will start QC the results next week.

teodorgeorgiev commented 5 years ago

Hi @gsautter: As a result from our meeting in Sofia we have implemented in TaxPub XMLs the following changes:

We have added UUIDs for article, treatments, figures, suppl. files. elements. You can adopt in Plazi the UUIDs for article, treatments.

Article UUID:

xpath: //article-meta/article-id[@pub-id-type="other"] and the value start with "urn:lsid:arphahub.com:pub:" example: <article-id pub-id-type="other">urn:lsid:arphahub.com:pub:fdd04d4d-3c9c-52d6-8a65-92275a98f097</article-id>

Treatment UUID: xpath: //tp:taxon-treatment/tp:nomenclature/tp:taxon-name/object-id[@content-type="arpha"]

example:

<tp:taxon-treatment>
    <tp:nomenclature>
        <tp:taxon-name>
            <object-id content-type="arpha">c0ba557b-ca86-5da2-b771-3b049a0609a0</object-id>

Also for the figures we have added UUID, DOI and URL:

The format of the URLs of the figure looks like that: https://binary.pensoft.net/fig/318267 - open the original file

You can also add suffix to get different sizes, i.e.: https://binary.pensoft.net/fig/318267/big https://binary.pensoft.net/fig/318267/singlefig https://binary.pensoft.net/fig/318267/singlefigmini

If the requested image size is missing or the suffix is wrong, you will get the original.

<fig id="F5256540" position="float" orientation="portrait">
    <object-id content-type="arpha">c2e8cf93-7d4f-5907-88af-92017451a5c2</object-id>
    <object-id content-type="doi">10.3897/BDJ.7.e37569.figure1</object-id>
    <label>Figure 1.</label>
    <caption>
        <p>..........</p>
    </caption>
    <graphic xlink:href="bdj-07-e37569-g001.png" position="float" id="oo_318267.png" orientation="portrait" xlink:type="simple">
          <uri content-type="original_file">https://binary.pensoft.net/fig/318267</uri>
     </graphic>
</fig>

You can test the new changes with the following articles: https://bdj.pensoft.net/article/37569/ https://zookeys.pensoft.net/article/31755/

Please let me know if you have any questions  

gsautter commented 5 years ago

Just looked at the XML (https://zookeys.pensoft.net/article/31755/download/xml/), and this looks pretty good to me.

I wonder why you put the treatment UUID inside the taxon name (treatment-meta would have been more in line with my intuition), but that's OK - we do have a specific path, and that's enough.

gsautter commented 5 years ago

Guess I'll go with the URI for the figures, as the DOI resolves to the article and selects the figure, whereas we want the plain figure in TB.

teodorgeorgiev commented 5 years ago

Just looked at the XML (https://zookeys.pensoft.net/article/31755/download/xml/), and this looks pretty good to me.

I wonder why you put the treatment UUID inside the taxon name (treatment-meta would have been more in line with my intuition), but that's OK - we do have a specific path, and that's enough.

This was the best place I could find. The content model does not allow me to add this in the treatment-meta. I can have there: ((contrib-group,kwd-group,permissions?),mixed-citation?) Anyway, it should be sufficient enough.

Guess I'll go with the URI for the figures, as the DOI resolves to the article and selects the figure, whereas we want the plain figure in TB.

Sure, please use the URIs the plain figures in TB, and include the DOI in the figure citation.

gsautter commented 5 years ago

Done, and done. Works nicely now.

The only trouble maker left is the article UUID ... used to use the MD5 hash of the article URL in that capacity, which facilitates filtering previously imported articles right from the RSS feed, without even loading the whole XML. Left to wonder how I could emulate this filtering functionality while still using the newly added article UUID as the document ID ...

gsautter commented 5 years ago

Got it deployed now, let's let it run for a few days and see if any problems occur. If not, the next step is to devise a strategy and retro-apply the newly added functionality to all 15,000+ previously imported articles ...

tcatapano commented 5 years ago

@teodorgeorgiev @gsautter: With regard to placement of treatment UUID (https://github.com/plazi/arcadia-project/issues/58#issuecomment-512355419), treatment-meta/mixed-citation/object-id should be used, for example:


<tp:taxon-treatment>
<tp:treatment-meta>
<kwd-group>
<label>Taxon classification</label>
<kwd>
<named-content content-type="kingdom" xlink:type="simple">Animalia</named-content>
</kwd>
<kwd>
<named-content content-type="order" xlink:type="simple">Diptera</named-content>
</kwd>
<kwd>
<named-content content-type="family" xlink:type="simple">Tipulidae</named-content>
</kwd>
</kwd-group>
<mixed-citation>
<object-id content-type="zoobank" xlink:type="simple">
http://zoobank.org/E14D124D-1789-4326-9C08-7B896BE3CA95
</object-id>
</mixed-citation>
<tp:treatment-meta>
gsautter commented 5 years ago

Looks like a bit more an intuitive placement than the current one, yes.

But all we ultimately need is some well-defined placement of the UUIDs, so I'm rather fine with the current one, too.

myrmoteras commented 5 years ago

@teodorgeorgiev @gsautter when do you expect that the link to images are embedded in Zookeys so that they also appear on GBIF? (like they do for the plazi processed treatments)

image

gsautter commented 5 years ago

I deployed the updated TaxPub importer earlier this week ... looking at http://tb.plazi.org/GgServer/html/395E9C89C9BC4E4DAD3A5A854A1F7789 , for instance, this seems to work as supposed to, including looping through Pensoft-minted treatment UUIDs.