plazi / gg2rdf

A tool to transform golden gate XML to RDF turtle
MIT License
2 stars 1 forks source link

Replace XSLT with TS #11

Closed nleanba closed 4 months ago

nleanba commented 4 months ago

Open todos:

nleanba commented 4 months ago

manually checked all treatments until 006E64249F0BFFC1A6E1FAC0364EF948 (going alphabetically by ID)

they are all fine*

A notable change is that it changes a bunch of treatsTaxonName to definesTaxonName, as far as i can tell this seems correct

some changes that I consider irrelevant or even improvements:

nleanba commented 4 months ago

Weird change in 0129163AB112602AFCCAFBD9FE4DF90D: the publication uri is changed to the one provided as docSource, which seems not to actually point to the article itself (unlike the uri generated by XSLT which is from ID-DOI). Is this a bug in the XML or in the XSLT-replacement?

nleanba commented 4 months ago

Checked until 016512208D171293F4B2C381B36B1F22

All fine notwithstanding changes mentioned above

I think it can replace the xslt like this, fixing potential other errors when they are noticed.

@retog opinions?

retog commented 4 months ago

Why is outputPropertiesonly used in makePublication?

nleanba commented 4 months ago

Why is outputPropertiesonly used in makePublication?

outputProperties was the first way ive done it, everything else has been updated to use Subject, but I didn't think changing it for makePublication would be worth it.

retog commented 4 months ago

I've added fish to the container but I still can't execute the script

root@fbc1a93c34b7:/workspaces/gg2rdf# ./test_noxslt.fish ./ex.xml 
File ./ex.ttl doesn't exist! Aborting
root@fbc1a93c34b7:/workspaces/gg2rdf# ls ex.xml 
ex.xml
root@fbc1a93c34b7:/workspaces/gg2rdf# ./test_noxslt.fish `pwd`/ex.xml 
File /workspaces/gg2rdf/ex.ttl doesn't exist! Aborting
retog commented 4 months ago

I didn't think changing it for makePublication would be worth it.

I tend to disagree. It makes the NOTES at the top of the file inaccurate, and even if that is fixed it still increases complexity. I think the additional time needed when making future changes far outweighs the tedious but relatively small amount of work to make it consistent now.

nleanba commented 4 months ago

I've added fish to the container but I still can't execute the script

root@fbc1a93c34b7:/workspaces/gg2rdf# ./test_noxslt.fish ./ex.xml 
File ./ex.ttl doesn't exist! Aborting
root@fbc1a93c34b7:/workspaces/gg2rdf# ls ex.xml 
ex.xml
root@fbc1a93c34b7:/workspaces/gg2rdf# ./test_noxslt.fish `pwd`/ex.xml 
File /workspaces/gg2rdf/ex.ttl doesn't exist! Aborting

The error is because it checks for {$ttlReferenceDir}/ex.ttl, which probaly does not exist in the docker container.

./test_noxslt.fish was never intended to be run in the container, but is only for manual testing. If you wish to use it in a container, you should already be running an interactive shell in the container, install fish from there.

If you wish to simply check what output gg2rdf.ts produces, just call deno run --allow-read --allow-write ./gg2rdf.ts -i <xml filename> -o <ttl filename>

nleanba commented 4 months ago

I didn't think changing it for makePublication would be worth it.

I tend to disagree. It makes the NOTES at the top of the file inaccurate, and even if that is fixed it still increases complexity. I think the additional time needed when making future changes far outweighs the tedious but relatively small amount of work to make it consistent now.

i have changed this now

retog commented 4 months ago

Weird that it didn't! I committed from the working dev container.

On February 29, 2024 8:18:01 AM GMT+01:00, nleanba @.***> wrote:

@nleanba commented on this pull request.

On Dockerfile:

i have removed it again, so that the container actually builds again

-- Reply to this email directly or view it on GitHub: https://github.com/plazi/gg2rdf/pull/11#discussion_r1507113075 You are receiving this because you were mentioned.

Message ID: @.***>

retog commented 4 months ago

Experimenting with 000040332F2853C295734E7BD4190F05. I see the current ttl version has 106 triples,the one generated by the new transformer 118.

These are the additional triples:

 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://plazi.org/vocab/treatment#hasTaxonName> <http://taxon-name.plazi.org/id/Animalia/Saigona> .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/class> "Insecta" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/family> "Dictyopharidae" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/genus> "Saigona" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/kingdom> "Animalia" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/order> "Hemiptera" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/phylum> "Arthropoda" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/rank> "genus" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/scientificNameAuthorship> "Matsumura, 1910" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://filteredpush.org/ontologies/oa/dwcFP#TaxonConcept> .
43a54
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_baiseensis_Zheng_2021> <http://rs.tdwg.org/dwc/terms/scientificNameAuthorship> "Zheng & Chen, 2021" .
100a112
 <http://treatment.plazi.org/id/000040332F2853C295734E7BD4190F05> <http://purl.org/dc/elements/1.1/title> "Saigona baiseensis Zheng & Chen 2021, sp. nov." .

The additional triples look sound, the previously missing title matches the one on https://treatment.plazi.org/id/000040332F2853C295734E7BD4190F05

I do see two error messages:

Error: Invalid Authority for <http://taxon-concept.plazi.org/id/Animalia/fulgoroidesINVALID>
Error: Invalid Authority for <http://taxon-concept.plazi.org/id/Animalia/fulgoroidesINVALID>
retog commented 4 months ago

Regarding

`output(...)` should not be assumed to run synchronous,
  and all data passed to it should still be valid under reordering of calls.

In turtle the order plays a role with regard to base and prefix, so I think this requirement should be dropped.

nleanba commented 4 months ago

I do see two error messages:

Error: Invalid Authority for <http://taxon-concept.plazi.org/id/Animalia/fulgoroidesINVALID>
Error: Invalid Authority for <http://taxon-concept.plazi.org/id/Animalia/fulgoroidesINVALID>

Those Errors indicate that it found mentions of these taxa, but it could not figure out their authorities to turn them into citations (augments or deprecates). The rdf output matches the behaviour of the xslt, but with more insight into why it was generated in the way it was.

nleanba commented 4 months ago

@retog did you rebase this into main?

this is rather inelegant, as now all my commits are marked as unverified.

i think a merge commit would have been better

retog commented 4 months ago

@retog did you rebase this into main?

this is rather inelegant, as now all my commits are marked as unverified.

i think a merge commit would have been better

I did. I didn't know about this implication.