Handling property object removal when subjects are reused

tpluscode commented 2 years ago

This issue is to improve the overall behaviour of the form when a complex value (blank node or IRI) is removed from the graph. Here's a working example

<view>
  ex:source <namedSource>, _:blankSource ;
  view:dimension [
    view:from [
       view:source <namedSource> ;
    ] ;
  ] ;
  view:dimension [
    view:from [
       view:source _:blankSource ;
    ] ;
  ] ;
.

<namedSource>
  view:cube <http://foo.bar/cube1> ;
.

_:blankSource
  view:cube <http://foo.bar/cube2> ;
.

Presently, when there are no constraints on the view:source's property shape, either value can be removed using the form UI.

When removing either, their usages (view:source triples) are kept intact
When removing _:blankSource, its subgraph (view:cube triple) would be removed
When removing <namedSource>, it subgraph would remain dangling, unconnected from the <view>'s subgraph

I found this behaviour inconsistent and propose some changes:

By default, prevent used node from being removed

In the example above, I would prevent the removal of either <namedSource> or _:blankSource as long as they are used as subject in the graph.

The problem here is to separate the "subject usage" (<view> view:source ?source) from "object usage" (?dimension view:from/view:source ?source`). The latter must always be allowed to be remove.

Option1 : keep subgraph

I think this would be the default

One option is to keep the subgraph of the removed node.

<view>
- ex:source <namedSource>, _:blankSource ;
+ ex:source <namedSource> ;
.

<namedSource>
  view:cube <http://foo.bar/cube1> ;
.

+# Nothing happened here
_:blankSource
  view:cube <http://foo.bar/cube2> ;
.

Option 1: remove subgraph

Property shape annotation

[
  sh:path sh:source ;
+ sh1:onRemove sh1:removeSubgraph ;
]

For consistency, I think I would prefer entire subgraph to be removed both for blank nodes, as well as named nodes

<view>
- ex:source <namedSource>, _:blankSource ;
+ ex:source _:blankSource ;
.

+# Remove entire representation of <namedSource>
-<namedSource>
- view:cube <http://foo.bar/cube1> ;
-.

_:blankSource
  view:cube <http://foo.bar/cube2> ;
.

Option 2: remove usages

This would override the default behaviour, allowing the removal of form values which are used as objects

Property shape annotation

[
  sh:path sh:source ;
+ sh1:onRemove sh1:removeUsages ;
]

<view>
-  ex:source _:blankSource ;
  view:dimension [
    view:from [
+      # usage removed
-      view:source _:blankSource ;
    ] ;
  ] ;
.

+# Source kept in the graph
_:blankSource
  view:cube <http://foo.bar/cube2> ;
.

cristianvasquez commented 2 years ago

My take is to mark things to be kept,

that would be option 3?

I don't know which predicate to use, but perhaps I would not use the intention in the predicate, like sh1:removeUsages I would go for something that groups, ex:sh1:group?

Example

Supposing forms are always a tree, I would mark all URIs that define partitions of interest.

For example, having:

<alice>  foaf:knows <bob> ;
        <livesIn> <house> ;
        <name> "Alice" ;
        <likes> <icecream> .

<icecream> <flavor> "chocolate" .

<bob> <livesIn> <house> ;
    <name> "Bob" .      

<house> 
    <address> "Wonderland" .

One can say that <alice>, <bob> and <house> mark entities of interest (to keep).

Looking at this graph as a tree that starts from <alice>, one can generate partitions (or documents) formed while walking through the tree.

Document 1:

<alice>  foaf:knows <bob> ;
        <livesIn> <house> ;
        <name> "Alice" ;
        <likes> <icecream> .

<icecream> <flavor> "chocolate" .

Document 2:

<bob> <livesIn> <house> ;
    <name> "Bob" .

Document 3:

<house> 
    <address> "Wonderland" .

One can modify any quad anytime, but when <alice> is deleted, all triples in document 1 could be deleted. <bob> and <house>, documents 2 and 3 respectively.

If this is a graph in memory, I would personally use named graphs to differentiate between quads in those documents. 3 named graphs, <alice>, <bob> and <house>. With named graphs becomes trivial to keep track and delete a group of quads.

Another interesting approach is the one that @bergos used in rdf-cube-view-query. A tree structure to 'remember' triples that should be deleted together: https://github.com/zazuko/rdf-cube-view-query/blob/master/lib/Node.js, I think is to handle a similar problem.

tpluscode commented 2 years ago

In practice most forms are indeed trees with a single root but I do not want to make that assumption. Multiple roots would be rare and I personally have not come across such a use case. Graphs, however, do happen. Such that there is a single root node but cycles or edges between tree branches are possible.

I would not like to mix named graphs here. The context is always a single document. The "data graph" in SHACL lingo.

Now that I think about this, maybe the missing piece is to annotate/mark a Property Shape so that objects are "owned" by the Focus Node. In the original example, the view itself owns a source. The view:from/view:source usage does not, it's only a reference.

:ViewShape 
  sh:property [
    sh:path view:source ; 
    sh:node :SourceShape ;
    sh:class view:Source ;
+   # assert ownership as subtype
+   a sh1:ContainmentProperty ;
  ] ;

  sh:property [
    sh:path view:dimension ;
    sh:node :DimensionShape ;
  ] ;
.

:DimensionShape
  sh:property [
    sh:path view:from ;
    sh:node [
+     # no "containment" here, meaning that when removed, only the object usage is removed
      sh:path view:source ;
      sh:class view:Source ;
    ] ;
  ] ;
.

cristianvasquez commented 2 years ago

Oh, I talked about named graphs as just an implementation choice for the deletion of things for a graph in memory. Not related to SHACL :) . Perhaps I was mixing topics there.

Objects "owned" by the Focus Node look like a good option. It would enable other things as well; for example 'you cannot delete this node because it's owned by this other one."

In your previous example, you don't use the word owned but contained. Why that choice of words?

tpluscode commented 2 years ago

Named graph are a good idea in general. I totally use them similarly, to easily manage "wholeness". But in a form you typically work with a single document IMO

In your previous example, you don't use the word owned but contained. Why that choice of words?

No particular reason. I do lean towards "contain" I think :)

cristianvasquez commented 2 years ago

ContainmentProperty suggests the property contains things, which confuses me a little :)

I don't have a clue of what would be a good name, but when you say ownership to delete, this makes me remember the parallel with the SQL Databases, where you can do Cascade Deletion.

As you know, In SQL databases, they enforce this with foreign keys.

A foreign key (FK) is a column or combination of columns that is used to establish and enforce a link between the data in two tables to control the data that can be stored in the foreign key table

There is something like that in SHACL?

tpluscode commented 2 years ago

Ok, I'm pretty convinced. Ownership may be a better fit after all

hypermedia-app / shaperone