plazi / treatmentBank

Repository devoted to house keeping of treatmentBank
0 stars 0 forks source link

access to treatment data: updates #37

Open myrmoteras opened 2 years ago

myrmoteras commented 2 years ago

Hi Donat,

DA: We have an issue with the export of treatments via the ZIP to Synospecies. The treatments therein are not updated and thus when we (Jeremy and me) make changes to get the treatment citations do in synospecies what they should, these are not reflected. The last change of these treatments have been from Jan. 7.

GS: in the full dump, that's true. But the "monthly" dump should include everything changed since then, the "weekly" dump everything changed since the last "monthly" dump, and the "daily" dump everything changed since the last "weekly" dump ... If you import those four dumps in chronological order, always replacing any earlier file, you get a complete snapshot of the overall status of the collection at 3am that morning (when dump packing starts).

The incremental dumps are the only sensible way of handling the sheer amount of data we're dealing with by now ... there is no point in packing 10+GB of full dumps every night just because a few hundred treatments have changed (I hope you get that), which is why I devised the incremental dumps in December.

What's more, there is a dump handler tool that is available for download from the same page as the dumps proper ( https://tb.plazi.org/dumps/ ). That tool does the exact thing described above, i.e., reconstruct an up-to-date full dump from the four incremental dumps.

Puneet is actually using that mechanism quite successfully for Zenodeo.

import all treatments individually.

Hope that helps.

Best, Guido

myrmoteras commented 2 years ago

@gsautter you use the gate keeper to control the export treatment RDF to synospecies.

Is this also affecting the individual download of lodRdf as well as the dump?

gsautter commented 2 years ago

The gatekeeper does not (and cannot possibly) get involved for downloads of LOD-RDF via the web front-end. If we want that, I'll have to do something very similar to the approach we use for the DwC-As: export from the back-end and then make the files accessible via a dedicated servlet in the front-end. This is very much possible, if at a little effort on my part ... we could have this up and running in a day or two.