uwlib-cams / rml

Using the RML Mapper (https://github.com/RMLio/rmlmapper-java) to convert RDA data to BIBFRAME.
Creative Commons Zero v1.0 Universal
5 stars 2 forks source link

Add formal specification of Kiegel syntax #45

Open nichtich opened 2 years ago

nichtich commented 2 years ago

The specification of Kiegel syntax is hidden in a PDF. There should be a formal specification at some other place to reference. I'm also curious about a parser, it should be somewhere in this repository, right?

gerontakos commented 2 years ago

Apologies for the delay in responding! The Kiegel syntax was adopted due to a lack of a widely-used syntax in the profession. The discipline of ontology matching offers some options; for example, François Scharffe's work in ontology matching design patterns reveals syntax for representing alignments; unfortunately, we considered this too difficult. We already had an in-house syntax (without a specification) created by Joe Kiegel that was very simple; in addition, it looked like the same kind of mapping syntax other librarians were using when working with BIBFRAME. So we refined the syntax, created a human-readable document only (the pdf) on how to read it, and moved on. I understand this is not recommended practice; writing a formal specification would be much better, but we never intended for the syntax to survive beyond our proof-of-concept project. If we were to create a specification, is there an example of the type of specification you would like to see? If we use the syntax again, I think we will write that spec. We know the syntax is effective, but we had to parse it as text to use it, and that's not ideal. The entries that result from using the syntax are just long text strings. I think we can do better.

nichtich commented 2 years ago

I am not sure whether writing a robust, formal specification is actually required, maybe as a research project in ontology matching. The current working-draft however should better documented either as independent document (e.g. a git repository of its own) or as part of your implementation, so the syntax can better be linked to and examples can better be reused. PDF is broken for several reasons, so how about adding a markdown file or directory that describes the syntax, lists examples and references the implementation? I still don't understand whether the syntax is fully implemented at all and which script does the processing.

The entries that result from using the syntax are just long text strings.

Every source code is just a long text string, that's no problem. Domain specific languages are a great tool. Writing rules in RDF serialization would result in unreadable masses of additional syntax.