traitecoevo / APD

The Australian Plant Traits Dictionary
https://traitecoevo.github.io/APD/
4 stars 2 forks source link

Convert data to triples #4

Closed ehwenk closed 1 year ago

ehwenk commented 1 year ago
dfalster commented 1 year ago

Nice work! I can no longer comment on whether this meets requirements, as I'm not literate enough. Hopefully @cboettig can approve!

ehwenk commented 1 year ago

@cboettig

I've made all the above changes except changing the format of the namespace declaration, since I wasn't sure if this was meant to replace what I currently have or be added to the code to generate the json-ld representation.

Also, in making the other changes line 372, true_triples <- read_nquads("docs/ADP.nq") now yields a long list of errors that I've afraid I can't interpret. I'm guessing it is a mistake I made in adding a full URI for xsd:double and removing all references to xsd:string

cboettig commented 1 year ago

@ehwenk no worries, just taking a quick look at this now. Yeah, for some reason the data.frame triples_df has four columns instead of 3 -- I think the fourth column is the xsd-type definitions, but recall those should be tacked on to the "object" column using the ^^. It looks like all the string-type definitions are being encoded as literal NAs, and so those inscrutable error messages are really because of those extra "NA"s after the object that it doesn't understand, e.g.

<http://cerrado.linkeddata.es/ecology/ccon/#Recruitment> <http://www.w3.org/2000/01/rdf-schema#label> "recruitment" NA .

should just be:

<http://cerrado.linkeddata.es/ecology/ccon/#Recruitment> <http://www.w3.org/2000/01/rdf-schema#label> "recruitment"  .

(recall in the "quads" format there is a forth column, but it is always . because we want these things to be part of the same "graph".

cboettig commented 1 year ago

ok digging a bit further:

The extra column is being introduced by some error in reformatted_categorical data.frame, looks like all the other data.frames in the bind_rows are fine.

A few other things I spotted: looks like the double type is missing its duck feet symbol (or whatever the heck ^^ is :-) ):

e.g. I see

"12000"<https://www.w3.org/2001/XMLSchema#double>

instead of:

"12000"^^<https://www.w3.org/2001/XMLSchema#double>

Also looks like <xsd:date> wasn't expanded into full URI.