protegeproject / mapping-master

Library that supports the Mapping Master DSL for mapping spreadsheets to OWL ontologies
53 stars 10 forks source link

Allow References to Resolve to Multiple Values #3

Open martinjoconnor opened 8 years ago

martinjoconnor commented 8 years ago

Currently in Mapping Master a references clause resolves to exactly one value. References that support multi-value extraction would be useful in many common situations.

For example, say we have the following two-column spreadsheet, where column A contains a name and B contains a series of comma-separated aliases.

   A1=Frederick
   B1=The Big F, Freddy, The Fredster

and that we would like to create an OWL individual for each row with the given name and a non-functional datatype property holding the aliases.

Assuming a function called mm:split, something like the following could do this:

  Individual: @A1
    Facts: hasAlias @B1(mm:split(","))

However, since Mapping Master references resolve to only one value the above is not currently possible.

Mapping Master does support approaches that allow limited workarounds to this limitation.

For example, the following Mapping Master expression can be used to extract a three-part name from a cell:

  Individual: @A2 
  Types: Person 
  Facts: hasForename @A2(["(\S+)"]), 
         hasInitial @A2(["\S+\s(\S+)"]), 
         hasSurname @A2(["\S+\s\S+\s(\S+)"]) 

In general, however, it is not possible to extract a variable number of value from a cell in a principled way.

Such support would be nice, since a sequence of delimited values in cells is very common in spreadsheets.

This change would not involve any grammar modifications. However, it would require fairly substantial changes to the rendering engine. Since references can occur in many places in a Mapping Master expression the engine would have to anticipate multiple values in its handling of a significant number of clauses.

It is easy to see that the engine would also have to handle the easily-introduced cross product situation.

For example, given the following spreadsheet

  A1=Frederick, Fred
  B1=The Big F, Freddy, The Fredster

and the following expression

  Individual: @A1(mm:split(","))
    Facts: hasAlias @B1(mm:split(","))

it is clear that the rendering engine would have to do a cross product of all combinations of A1 and B1 extracted values.

A possible implementation simplification would be to initially only allow multiple valued references in the value position of a facts clause.