spdx / spdx-3-model

The model for the information captured in SPDX version 3 standard.
https://spdx.dev/use/specifications/
Other
69 stars 44 forks source link

Another type mismatch in ExpandedLicensing #637

Closed zvr closed 8 months ago

zvr commented 8 months ago

Now that #613 has been merged (to fix #612), there is another error:

ERROR: In class /ExpandedLicensing/OrLaterOperator, property subjectLicense has type License but the range of /ExpandedLicensing/subjectLicense is ExtendableLicense

Explanation, to help: we had WithAddition and OrLater operators, and the subjectLicense property.

Timestamp WithAddition OrLater subjectLicense
initially ExtendableLicense License License
now (after #613) ExtendableLicense License ExtendableLicense
we need ExtendableLicense ExtendableLicense ExtendableLicense
goneall commented 8 months ago

@zvr - this was actually intentional. The OrLater operator only accepts License as an argument whereas WithAddition operator can accept either a license or an OrLater class.

Since License is a subclass of ExtendableLicense I think having the type of License is consistent with the range ExtendableLicense.

There are a few ways we could fix the error:

I'm leaning towards the last option - create 2 distinct properties.

Before making a PR, I would like to get agreement from @zvr and @sbarnum that this is best route for RC2. Let me know what you think.

zvr commented 8 months ago

Ah, you are absolutely correct, @goneall -- so the last line on my table above is not what we want.

Unfortunately, the fix can't be very simple: each property is defined in their file (e.g. subjectLicense.md in this case) and this includes its type (the range, in RDF-speak).

The fact that we add a "type" when we reference a property in a class is something for our own convenience, to save us a click to go to the property page and see it.

When the model is translated in RDF, this information is completely discarded. That's why I've added the check, to verify that the two places where we have this duplicate information match.

Going back to ways to proceed that you listed (and I'll add one):

  1. spec-parser checking superclasses: makes no sense, the information is discarded anyway
  2. changing to please spec-parser
  3. two properties, each of correct type/range
  4. we simply ignore the error message produced by the spec-parser

In the end, since the "type" information in classes definitions is ignored, alternatives 1, 2, and 4 will produce the same thing. If we want to have the correct restriction represented in RDF, we have to have two different properties.

I am OK either way. We know that not all semantic checks of the validity of license expressions will be represented in the grammar/RDF, so I'm ok having an ontology that allows expressing MIT++ (or-later of or-later). On the other hand, the fix is pretty easy (a new property, only used once).

(PS. Please don't use OrSubjectLicense as you propose -- let's differentiate between Or and OrLater)

zvr commented 8 months ago

May I propose subjectLicense and subjectExtendableLicense ?

(why do I get good ideas after I submit comments?)

sbarnum commented 8 months ago

I think we have both a tactical issue and a strategic issue here.

The tactical issue is what do we do about the two differing uses of this property today pushing toward RC2 without time to fully solve the strategic problem. I should note that the strategic problem is not simply how to model these differing uses of this property in this case and have the spec parser not complain. The strategic problem is fixing the approach that the spec parser is taking according to Alexios. This is no complaint against Alexios and others who have contributed blood, sweat and tears in work on the spec parser. I value and appreciate all of their hard work. This is just one more reason why I continue to strongly assert that our spec should BE the formal RDFS/OWL/SHACL ontology and not a text version that must be interpreted and converted to the formal ontology. Writing a tool to interpret a textual form and correctly generate a formal ontology is challenging. That being said, the strategic issue at hand is that it is not valid for the spec parser to attempt to force the range declared on the property and the type asserted for the property in any particular use ( on a class) as the same thing, especially by discarding the property type asserted on the class use. The range declared on the property and the type asserted for the property in any particular use are definitely NOT the same thing. The type asserted for the property in any particular use is not simply a convenience for people reading the spec. It is a distinct and very important assertion in the spec/ontology. It is 100% valid and appropriate (as demonstrated in this current issue) for them not to be the same thing as long as the type asserted for the property in any particular use falls within the scope (is a subclass) of the class asserted in the range declared on the property. This is actually fairly common. This is one of the reasons why the range assertion on the property and the use of SHACL shapes to validate correct use on a specific class are structured and handled differently. The range asserted in the property specification should be used to generate the range assertion on the property definition in the formal ontology. The type asserted for the property in any particular use ( on a class) should be used to generate an appropriate SHACL property shape on that class in the formal ontology and should use the type asserted for use on that class. They are not explicitly connected. SHACL validation will consider the semantics and implicitly handle the subclassing issue.

I know that we do not have time to fix this now given our race to RC2. That being said, I will strongly propose that this issue MUST be fixed as soon as possible after RC2. It is a technical tool implementation decision that is blocking us from correctly specifying the ontology we need. This current issue clearly demonstrates that in my opinion. The tactical decision in front of us seems to be which option of work around is least incorrect/non-optimal.

Given the current limitations is sounds like the only option to get us a correct ontology is to define separate properties for each of the two uses. This is non-optimal (causes an extra property) but at least lets us define correct data. Fixing the underlying strategic issue in the spec parser should hopefully prevent us from having to do something like this again in the future.

zvr commented 8 months ago

I think there has been a misunderstanding, @sbarnum.

It's not that we are restricted by the tool (spec-parser) to do what we want to do; it's simply that the tool (in its current state) implements what we were doing, and that's what I described above. We always had the the type of a property in a class be identical to the range of the property.

If we want to have it differently, the check can easily be removed.

But I disagree with your comment that this is "fairly common" -- at least for our case. I understand it's "100% valid", but the fact that we didn't need it for 3 years (not to mention the 10 years of SPDXv2) speaks for us being able to construct our model without using this. In this specific case, I'm not even convinced it's the right approach, as I think the solution we ended up with is actually cleaner and better.

We also should keep in mind the actual breadth of what is affected. This is ExpandedLicensing which is great to have, but realistically will be used by a minority of tools (Gary's and mine, probably ;-)). Yes, we did elaborate modeling in Core, Software and Licensing, because we had the experience of 10+ years of SPDX. Newer namespaces are much simpler. I think almost every property in them is an xsd value or an enumeration.

To the more general point, you know that I'm a big believer that "RDF should be the single source of truth". Unfortunately, we cannot write the specification in RDF, since in that case fewer than 5 people will be contributing (and I don't think ISO publishes RDF documents). Therefore, we decided we had to use some textual form and what we have was the best we could come up with.