psychoinformatics-de / shacl-vue

Automatic generation of user interfaces from SHACL
https://psychoinformatics-de.github.io/shacl-vue/docs
MIT License
0 stars 0 forks source link

Settle on a design for real-time, in-memory linked data writing/storage #19

Closed jsheunis closed 1 month ago

jsheunis commented 3 months ago

This issue serves to explore how exactly the design should look for real-time in-memory storage (and display) of data entered into data entry components in the shacl-vue UI.

At the core of it the shacl-vue app is a graph editing user interface, but one that aims to hide the complexity of nodes/subject/object/predicate etc (as much as makes sense) by presenting itself as a standard form editor to users who just want to use an easy interface for submitting metadata before walking away.

A common challenge in form editor development is deciding when to write data entered into form fields to the format intended to be exported/downloaded or linked to a next step in the pipeline. As a simple example, a set of text fields could each have their own variable to store the entered text, and when the user selects Submit some code will write the values of these variables into a key-value format such as a JSON object. I.e. writing to the intended format happens at the end, not during. Given that we work with reactive objects in a VueJS app, it would also be possible to bind the data entered into the text fields directly to the keys of the JSON object, so that the JSON object reflects the true state of all of the form fields at any point in time. This makes live rendering of entered data pretty straightforward, because the state of the JSON object just has to be rendered, and it will update-as-you-type.

However, forms will hardly ever be this simple. For example, there might be fields that require a selection from a list of existing objects (that are stored where/how?), or a field that requires adding a new object with its own set of complex fields to enter (i.e. forms within forms). This implies some parent-child (-grandchild, etc) relationship between data objects that could all be in the process of being edited, and which might have to be deleted/replaced/edited in future. In a VueJS application, the data fields of these data objects of parents and (grand)children will typically be entered into separate VueJS components which all have their own contained state. Components could share data with their children, or a design could be implemented in which state is shared globally between all components, or some limited scope with a subgroup of components. There are multiple ways of approaching this challenge, and an exploration has been started here: https://github.com/psychoinformatics-de/shacl-vue/issues/13.

Another important point is to consider the graph aspect in the context that has just been sketched. The intended output format of entering data into a form is a graph database in the browser, using the libraries mentioned here https://github.com/psychoinformatics-de/shacl-vue/issues/6#issuecomment-2162590374. The question is whether (and if so, how) data entered into components should reactively enter into the graph database, or should it rather enter into some interim format and state that is written to the graph database upon form completion. The existing RDFJS-related APIs have some limitations that make live updating of existing nodes difficult (I'm not sure yet whether this is by design or if it could actually be changed), so it could perhaps be the case that the interim format is the only way. The interim format vs reactive graph database, as well as the global/local state management of VueJS components, also influence how the data can be rendered in real-time. One could envision some real-time view of RDF in ttl format as one enters data into form fields. This ttl could be derived from an interim data state, or from the graph database itself. Everything depends on everything.

To be continued...

jsheunis commented 3 months ago

Ok, an update. This is likely not the end result or final design, but this is what I was able to achieve.

The major constraints that influenced the design is:

The question is whether (and if so, how) data entered into components should reactively enter into the graph database, or should it rather enter into some interim format and state that is written to the graph database upon form completion. The existing RDFJS-related APIs have some limitations that make live updating of existing nodes difficult (I'm not sure yet whether this is by design or if it could actually be changed), so it could perhaps be the case that the interim format is the only way

I decided in favour of an interim format for form-entered data (a standard json object), as well as a global data state that can be updated by the individual form components. The global data state is implemented as a composable which is imported into the main application module (App.vue), and which is then made available to any children components who want to use it via the Provide/Inject mechanism.

Specific form elements that are being edited by the user need to be uniquely identified, since they need to enter correctly into the "database" (the global data object). I first tried to tie identification to the component instance being created when a given form element is viewed by the user, but Vue's component lifecycle mechanics are quite complicated and not necessarily suited for such usage (e.g. component instances can be reused between views, even if they are used to render different sets of data). I tried the keep-alive property, which caches already created component instances and doesn't destroy them when they go out of view, but they can still be reused even for different sets of data. To ensure that a unique instance is created for a given set of data, one can use the :key property of a component tag, and give it a unique key (e.g. the node IRI or alike). One drawback of the :key property in this context is that Vue doesn't reuse the same component instance for the same key that was provided more than once. With every change of the key, a new instance is created.

So in the end I wen't away from tying uniqueness of a form field to the component instance, and rather to specific keys that are unique w.r.t. the data. In SHACL we have the data structure of a Node with multiple Node properties. So I made the data object follow the exact same structure, e.g.:

{
"https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution": {
    "subject": "",
    "predicate": "",
    "object": "",
    "properties": {
      "https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution--https://concepts.datalad.org/s/thing/unreleased/name": {
        "subject": "",
        "predicate": "",
        "object": ""
      },
      "https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution--https://concepts.datalad.org/s/thing/unreleased/title": {
        "subject": "",
        "predicate": "",
        "object": ""
      },
      ...
   }
   ...
}

Root-level keys are the class IRIs (e.g. https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution), their values are objects with keys subject, predicate, object, and properties, and then the keys in the properties object are all node property IRIs (e.g. https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution--https://concepts.datalad.org/s/thing/unreleased/title) which come from the SHACL field sh:path.

One thing that I have not addressed here is when multiple of the same nodes/properties need to be entered by the user. At the moment the data object will have one unique slot per node or node-property, but in reality there might have to be several. There are multiple ways that this can take shape. There will likely be a Save button on the node level, which could be a trigger to copy the single instance of a node from the "being edited"-database into a "saved"-database. Or it could all go into the same existing database, with the values of unique keys not being object as they are currently, but rather arrays. And the rule could be that the last element of the array is always the one being edit, for example. I will play with some of these ideas next.

The next step is to render the data being entered/edited in real-time into some type of RDF data viewer.

jsheunis commented 3 months ago

Something that still needs solving is the data-type in which the form field entry needs to be entered into the graph. Currently it's just a string. It should be possible to write it as a URI, Literal, etc.

jsheunis commented 3 months ago
{
"https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution": {
    "subject": "",
    "predicate": "",
    "object": "",
    "properties": {
      "https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution--https://concepts.datalad.org/s/thing/unreleased/name": {
        "subject": "",
        "predicate": "",
        "object": ""
      },
      "https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution--https://concepts.datalad.org/s/thing/unreleased/title": {
        "subject": "",
        "predicate": "",
        "object": ""
      },
      ...
   }
   ...
}

This design has been simplified by removing redundant keys in https://github.com/psychoinformatics-de/shacl-vue/commit/a1e4e7470a6b1ebaca48bc8e1e7d98c96380ee48. The updated structure resembles the following example:


{
   "https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution": {
       "https://concepts.datalad.org/s/thing/unreleased/name": <value>,
       "https://concepts.datalad.org/s/thing/unreleased/title": <value>,
       ...
   }
   ...
}

One thing that is still missing is accounting for the possibility of arrays, both for nodes (e.g. ScientificDataDistribution) and for "predicate-object"s. An updated version of the structure might be something like:

{
   "https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution": [
      {
          https://concepts.datalad.org/s/thing/unreleased/name": [
             <value1>,
             <value2>
          ]
          https://concepts.datalad.org/s/thing/unreleased/title": [
             <value3>,
             <value4>
          ]
          ...
      },
      {
         ...
      }
   ] 
   ...
}
jsheunis commented 1 month ago

Update:

This is exactly what was settled on, to accommodate arrays:

{
   "https://concepts.datalad.org/s/sddui/unreleased/ScientificDataDistribution": [
      {
          https://concepts.datalad.org/s/thing/unreleased/name": [
             <value1>,
             <value2>
          ]
          https://concepts.datalad.org/s/thing/unreleased/title": [
             <value3>,
             <value4>
          ]
          ...
      },
      {
         ...
      }
   ] 
   ...
}

Related issue: https://github.com/psychoinformatics-de/shacl-vue/issues/36