w3c / dx-prof

The Profiles Vocabulary
https://w3c.github.io/dx-prof/prof/
Other
5 stars 2 forks source link

Are PROF roles misplaced in resourceDescription? #13

Open kcoyle opened 5 years ago

kcoyle commented 5 years ago

If the PROF resourceDescription is analogous to the DCAT distribution, it is a description of a physical resource, essentially a file. The PROF role is the relationship between a profile and a resource. It is not the nature of the resource, and the resource can have different roles in association with different profiles. I can only think to illustrate this with a (crude) diagram: roles Also note that with inheritance of resourceDescriptions, that inheritance implicitly includes the role, which may not be appropriate.

Among the possible solutions are:

  1. make role a property that links the profile to the resource
  2. ?? (I can only think of that one; surely there are others)
rob-metalinkage commented 5 years ago

No they are not misplaced - this is a qualified relationship.

specific relationships are an alternative pattern, but not appropriate because

a) we need other qualifications beside role (what the artefact is (its profile), how it is encoded, what parent it may have been inherited from) b) we dont want to manage updates to an ontology - either let users define roles or provide a register mechanism. c) the killer - an artefact may be used for defintion and validation and form building and documentation...

(we can still define a small number of obvious roles relating to our Use Cases - the issue then is whether this should be in the ontology or a separate register artefact)

rob-metalinkage commented 5 years ago

PS - please close this because this is covered already in w3c/dx-prof#12 and w3c/dxwg#536

kcoyle commented 5 years ago

I don't think it relates to those issues so I'm not going to close this. We'll work out how we want to manage the various modeling issues and then re-organize.

What I'm questioning is whether a "qualified relationship/association" is what's desired here. I think this gets back to what @andrea-perego was asking at the meeting today, which is what does the resourceDescription represent? As defined the resourceDescription is a combination of role and dataset description. The other attributes are fine and don't look especially "qualified" to me - just regular description as one does in a catalog. As long as it is describing the nature of the "thing", not a temporary or variable condition, it works. It's just role that bothers me. However, I am assuming that the statement "isInheritedFrom" is always true for the artifact that anchors the resourceDescription in something real. If not, ... well, we get back to "what does the resource descriptor describe"? Is it a thing or a temporary condition?

I'm not only fine with your "c)" I think it will be a fairly common use case. And I think that the roles may change over time. Which is why I want to separate the role and the resource, so those changes do not mean having to define a new resource. If my XX-AP.pdf is initially used for definition and vocabulary and documentation, then I create, as in my example, a SHACL file that includes my vocabulary, I create two new resource descriptions: one for definition and documentation and one for vocabulary. Is this what you intend?

I do think we need to work with examples, and some of those should show how changes take place. I also think that we should consider examples sharing of resources across profiles, even ones not managed by the same folks.

rob-metalinkage commented 5 years ago

A dcat:Distribution is essentially a qualified relationship between a dcat:Dataset and the actual distribution URI....

so its nothing more nor less than that, and even if we didnt want to be naturally able to catalog profiles using DCAT we probably should not invent a new pattern.

rob-metalinkage commented 5 years ago

Also, your argument about changing roles seems to reinforce the idea the artefact has a resourceDescriptor with possibly multiple roles

its somewhat easier to make statements about change to the role classification of an object (the Resource Descriptor) than about a role relationship - you'd need to reify the relationship before attaching change history to it, which means minting a new object identifier, whereas we could just attach change notes as a property to a ResourceDescriptor. (i guess you could push change notes to the profile itself - but then we run into the need for rules for propagating change notes in a hierarchy if we are flattening it...)

kcoyle commented 5 years ago

Rob, examples are needed. Let's work on some.

kcoyle commented 5 years ago

Here are my first examples, just a stub for the profile. My question is: which of these is defined by the model, or does the model allow either? If both, is there a preferred solution? (A has one role per resource and repeats the artifact; B has all roles linked to a single resource and links the artifact only once.)

A:

<https://example.com/prof1>
    a prof:Profile ;
    prof:hasResource :dcat-ap-guidance-pdf1 ;
    prof:hasResource :dcat-ap-guidance-pdf2 ;

:dcat-ap-guidance-pdf1
    a prof:ResourceDescriptor;
    rdfs:label "DCAT-AP Guidance Document (PDF)" ;
    prof:hasRole roles:guidance ;
    dct:format <https://w3id.org/mediatypes/application/pdf> ;
    prof:hasArtifact
        <https://joinup.ec.europa.eu/rdf_entity/http_e_f_fdata_ceuropa_ceu_fw21_f17e18570_b1d77_b4171_b9df5_bb53cb4f017d4> ;

:dcat-ap-guidance-pdf2
    a prof:ResourceDescriptor;
    rdfs:label "DCAT-AP Guidance Document (PDF)" ;
    prof:hasRole roles:vocabulary ;
    dct:format <https://w3id.org/mediatypes/application/pdf> ;
    prof:hasArtifact
        <https://joinup.ec.europa.eu/rdf_entity/http_e_f_fdata_ceuropa_ceu_fw21_f17e18570_b1d77_b4171_b9df5_bb53cb4f017d4> .

B:

<https://example.com/prof1>
    a prof:Profile ;
    prof:hasResource :dcat-ap-guidance-pdf1 ;

:dcat-ap-guidance-pdf1
    a prof:ResourceDescriptor;
    rdfs:label "DCAT-AP Guidance Document (PDF)" ;
    prof:hasRole roles:guidance ;
    prof:hasRole roles:vocabulary ;
    dct:format <https://w3id.org/mediatypes/application/pdf> ;
    prof:hasArtifact
        <https://joinup.ec.europa.eu/rdf_entity/http_e_f_fdata_ceuropa_ceu_fw21_f17e18570_b1d77_b4171_b9df5_bb53cb4f017d4> .
kcoyle commented 5 years ago

Here's another example that shows a different organization of the role/relationship information and the information about the artifact. Information that is essential to the artifact and unchanging (e.g. its format) should be associated directly with the artifact, not with the resource descriptor. (I shortened the IRI of the artifact just to fit it neatly in the comment.)

:dcat-ap-guidance-pdf1
    a  prof:ResourceDescriptor;
   dct:description "The Guidance Document for this profile."
    prof:hasRole roles:guidance ;
    prof:hasArtifact  <https://joinup.ec.europa.eu/...17d4> .

<https://joinup.ec.europa.eu/...17d4> dct: format <https://w3id.org/mediatypes/application/pdf> ;
        dct:title "DCAT-AP Guidance Document (PDF)" ;
    dct:datePublished "1996" .

(h/t @no-reply)

Multiple roles would then logically be multiple resource descriptions, but the artifact description would remain the same:

:dcat-ap-guidance-pdf1
    a  prof:ResourceDescriptor;
   dct:description "The Guidance Document for this profile."
    prof:hasRole roles:guidance ;
    prof:hasArtifact  <https://joinup.ec.europa.eu/...17d4> .

:dcat-ap-vocabulary
    a  prof:ResourceDescriptor;
   dct:description "This document lists the vocabulary terms."
    prof:hasRole roles:vocabulary ;
    prof:hasArtifact  <https://joinup.ec.europa.eu/...17d4> .

<https://joinup.ec.europa.eu/...17d4> dct: format <https://w3id.org/mediatypes/application/pdf> ;
        dct:title "DCAT-AP Guidance Document (PDF)" ;
    dct:datePublished "1996" .

The conclusion here is that the roles are in the right place, but the artifact must exist independent of its role in relation to any one profile. This means that artifact graphs can be shared and the semantics of the artifact are not dependent on the profile in which they are referenced.

rob-metalinkage commented 5 years ago

agreed. This is really an implementation issue - artefacts either need persistent URLs or the catalogues (expressed usinfg prof) need to be kept up to date and have appropriate caching metadata. This is web 101, and i dont think we really need to even state it in guidance.

kcoyle commented 5 years ago

Actually, what @no-reply and I discussed is not related to implementation, it is a modelling issue. But now I see that a carriage return got missed so the example I gave wasn't clear. Here it is again:

:dcat-ap-guidance-pdf1
    a  prof:ResourceDescriptor;
   dct:description "The Guidance Document for this profile." ;
    prof:hasRole roles:guidance ;
    prof:hasArtifact  <https://joinup.ec.europa.eu/...17d4> .

<https://joinup.ec.europa.eu/...17d4> 
        dct: format <https://w3id.org/mediatypes/application/pdf> ;
        dct:title "DCAT-AP Guidance Document (PDF)" ;
    dct:datePublished "1996" .

The main difference is that dct:format is a property on the artifact, not on the resource descriptor, and that we anticipate a graph of information points linked to the artifact and no information about the artifact itself in the resource description graph. What the original example, which I copied from examples in github, says is that the graph that is a prof:ResourceDescriptor has a dct:format. (dct:format defines the format of the subject of the triple, which in this case is the resource descriptor.) That is not what I think is meant - it is the artifact that has a format. The format of the artifact is not a function of being linked to the profile - it is a persistent character of the artifact. A change in format would be a change in its identity - it would be a different artifact.

This could perhaps be solved by not including dct:format in the vocabulary, instead avoiding any description of the artifact. The assumption would be that if the artifact needs to be described, that would take place elsewhere in the "universal graph". We could hope that people provide the information about their profile-related data, but that may not be enough. It might be necessary for the PROF vocabulary to go further and specify a minimal core of information relating to artifacts if there are use cases that depend on this information. I suspect that the content negotiation proposal is such a use case.

rob-metalinkage commented 5 years ago

note that profiles vocabulary follows the DCAT pattern here to make it easy to catalog profiles and artefacts.
DCAT has all these properties on the Distribution class, not on the AccessURL

AFIACT "elsewhere" is not really an option - the competency questions for profile description have to allow a client to discover what resources are available in what roles, information models and formats for a profile.

I can see that if a resource is reused in multiple roles in different profiles it may be more elegant to declare the format once, and it may be closer to the correct use of dct:format than DCAT supports, but i suspect its likely to be a moot point - more resources will relate to a specific profile so we might not gain much.

we could specify a formal entailment to allow either the descriptor or the artefact to carry the property

kcoyle commented 5 years ago

Note: this issue also came in on the Shex comment, although it was a bit subtle since it was mainly an aside in a larger sentence:

"the media type is actually a property of the related artifact, not the relation."

kcoyle commented 5 years ago

prof:ResourceDescription is not the same as dcat:Distribution, so copying that model isn't appropriate. dcat:Distribution is "A specific representation of a dataset." The graph describes a single "thing". prof:ResourceDescription is a relationship between two entities due to the use of the role, which is relative. The role is also not an attribute of the artifact, which confirms its semantics as a relationship between a profile and a "thing" that has its own existence. Also, there is no "one-to-one" defined between prof:ResourceDescription and the artifact, AFAIK, and we have already established that an artifact can be defined with more than one role, which is unlike dcat:Distribution.

I'd have to think more about dcat:Distribution in this regard because that usage of dct:format may also be questionable.

As for "elsewhere" - I don't think you can control this. It's an open web and you may be using resources controlled by others. That will definitely be the case in the library world. The "open world assumption" applies here and nothing so far in the vocabulary would prevent use of any artifact found on the web. An implementation may do so, but the deliverable here is for a vocabulary, not an implementation. An implementation would be a good use case for a profile (with constraints) of PROF.

smrgeoinfo commented 5 years ago

couldn't the type of prof:hasArtefact be dcat:Distribution (something like this).

kcoyle commented 5 years ago

@smrgeoinfo Thanks. Yes, I was thinking along those lines, possibly inspired by your diagram. My thinking is that because role and distribution are not one-to-one, and refer to different logical "things", the role and the distribution must be separated. In the design I proposed the graph linking directly to the artifact could be analogous to a dcat:Distribution, which is a description of a single, physical (including digital) resource. The resource itself is linked from the distribution.

The hitch there is the use of dct:format. dcat:Distribution also uses dct:format in a way that is questionable. The subject of dct:format is a "thing" that is of that format, e.g.

myFile.csv dct:format "text/csv" 

If we use dct:format in this way in a distribution, then we also have:

myFile.csv dcat:accessURL myFile.csv

Is this desirable? I suspect not.

What seems logical to me is that all triples that describe the artifact must be in a graph with the URI of the artifact as the subject. And all triples that describe the role would be in a graph whose URI identifies the role. Relationships between entities should be properties that link those entities.

I would love to learn of examples of implementations of RDF that have balanced this need for description of things and identification of the things. I wonder if we can find an analog in the Web Annotation Ontology, and therefore if @azaroth42 might not have some ideas? The dcat:Distribution might be a kind of annotation?

kcoyle commented 5 years ago

Note: I found a succinct comment from @RubenVerborgh in relation to another W3C effort that perfectly captures the problem with PROF's use of dct:format:

That's not the definition of dct:format. It's meaning is not "is available as" but rather "The file format, physical medium, or dimensions of the resource."

And I have confirmed that "the resource" = "subject URI of the triple in which dct:format appears." Unfortunately, the usage in DCAT is the same as the usage proposed here for PROF, and that's the rub.

smrgeoinfo commented 5 years ago

I interpret the subject URI of the dct:format triple to be a representation (the distribution), considered as a resource distinct from the subject resource of the dcat:Dataset. So in

:option670yh5rt5
  a dcat:Distribution ;
  dct:format "text/csv" ;
  dcat:accessURL <http://example.org/dataset-002.csv> ;

the format, and the accessURL are about the particular representation that is the subject of the dcat:Distribution. I don't see a problem.

rob-metalinkage commented 5 years ago

@kcoyle i dont see why a set of constraints defining a profile (eg SHACL or SHEX) document is not equivalent to a Distribution? discussions of treating vocabularies as "datasets" and the europa exampes cited in use cases provide evidence for that usage.

I do agree that the issue of the subject needs further unpacking and checking - but real world objects dont have information properties - representations of them do. You could possibly remove the whole concept of a dcat:Distribution and just make statements about the accessURL - provided a) accessURLs are stable (not very likely) b) there is a 1:1 mapping of distributions and accessURLs ( i dont think this is enforced anywhere)

a couple of possibilities: 1) make DCAT cataloguing of profiles more important in the profiles ontology description and perhaps even make the dcat alignment normative (i quite fancy this - because i think cataloguing profiles is a necessary implementation model) 2) more/better text about this representation pattern - try to find a non-DCAT explanation 3) rename the description and make it purely a reification of the role relationship as per your examples and try to find and illustrate another way to handle cataloguing using DCAT 4) leave as is

options 2 & 3 require a fair bit more research to come up with justifications for the DCAT patterns and a compelling reason to depart (for option 3) , option 3 requires major changes i dont think that feedback so far warrants.

kcoyle commented 5 years ago

@rob-metalinkage What this comes down to, I believe, is whether PROF is a vocabulary for DCAT profiles or whether it is a general solution for any profiles. Using DCAT Distribution would presumably only be relevant to the former. Some of the assumptions that I perceive that drive the DCAT model (such as you mention above, having non-stable access URLs, or wishing to catalog profiles using a Distribution as a proxy or surrogate) are not universal. A general-purpose language wouldn't prevent the use of a Distribution-like catalog entry for a resource if folks wish to employ that. Requiring it, though, would be a limitation in my opinion. It would also make it difficult to integrate it with the profiles guidance work, which is not DCAT-specific.

In addition, moving on to including a "way to handle cataloguing" is considerably beyond the scope that we have agreed on up to now for PROF. If that is needed for a PROF implementation of DCAT APs, then it should be defined there. In essence, PROF the vocabulary will probably be used within a profile of PROF (getting meta!), since RDF itself provides no actual constraints, only inferences. And I can point to Dublin Core as proof that a general purpose vocabulary with minimal semantic commitment is very useful.

kcoyle commented 5 years ago

@smrgeoinfo That is not how DC defines dct:format. dct:format is the format of the subject of the triple, not what that subject "represents". DCAT could define a property that means: "the format of the thing cataloged at this URI". That still does not directly link the format statement to the accessURL statement, so it means that you can logically only have one format and one accessURL per Distribution in order to avoid confusion, but that may already be the rule.

rob-metalinkage commented 5 years ago

@kcoyle - w.r.t. to "whether PROF is a vocabulary for DCAT profiles" - the requirement to catalog profiles is orthogonal to the ability to to express profiles of DCAT itself.

thanks for pointing out the specific comment form the SHeX group - I think this needs to be taken to the plenary and addressed in DCAT . I am still comfortable following the DCAT lead here even though profiles are more general, since we have Use Cases of cataloguing profiles we still need an alignment to DCAT.

If DCAT does define a new property - maybe Profiles can import DCAT normatively to reuse it and make the alignment normative - or define its own property and put subPropertyOf axioms into the DCAT alignment,

Thinking about this - if we have a thing identified as R ...

and we have a graph of information about R

and we have a representation of that graph (e.g. TTL)

then: a) the format of the graph representation is known if you have it and unless you know the format you cant inspect it to find the declaration of the format... - so dct:format relating to that representation doesnt add any value to the representation b) the graph of information about R doesnt have a format - only representations of it do

so the range of dct:format really has to be an object that is a proxy for the real world artefact - so a statement about it is a statement about the artefact.

so you're point about role is very relevant.

In a Linked Data world.. if a ProfileResource id is dereferenced using content negotation, it could return the relevant artefact directly - so dct:format is entirely appropriate for such a behaviour. This applies to dcat too. If that implementation pattern is not allowable for some reason we need to address it in guidance.

I am wondering if we can just axiomitise that the relationship dct:format on a ProfileResource (aka ResourceDescriptor) entails statements about the artefact.

maybe we raise this in plenary and create an action to get a response form the DCAT sub group? Or just raise it as a comment on DCAT

kcoyle commented 5 years ago

@rob-metalinkage Could you make a proposal at this point based on what we've discussed here? Then we can put it to a vote.

You could do this in steps - first proposing a separation of roles and resources/artifacts, and if that gets positive responses then going more into how it could be done. On the other hand, if you have a clear idea of how to do this you can offer a single-step redesign. Whatever works for you.

rob-metalinkage commented 5 years ago

have re-read this issue and looked at dublin core again -

all the DC definitions state, for e.g. format, "The file format, physical medium, or dimensions of the resource."

and none of them make sense as properties of the "identifier of the resource" - the resource is the thing being described, the subject of all triples is the identifier of the resource, the semantics of all properties relates to the resource itself.

do what is a the subject of these DC properties - its an identifier that may be different from the resource itself that allows a canonical form of metadata about the resource to be accessed. This is IMHO consistent with the identifier of a ProfileResource or dcat:Distribution independent of some access URL - its the means to have a local graph of knowledge about an external object (with or without an access URI) but most importantly an address for this metadata to be retrieved for any type of resource.

Logically it is the artifact that has a format (or list it's identifier supports content negotiation), but the assumption that the artefact has a stable, known URI may be too strong. It would be an issue of information redundancy in information if the same artifact played lots of different roles in different profiles - but that seems a bit of a corner case.

so the proposal is to run the issue past DCAT team for comment and see whether the response is sufficient to respond to the original comment.

kcoyle commented 5 years ago

Rob, I don't think this relates to DCAT, only to the PROF vocabulary. DCAT does not have roles, and the question here is about combining the role and the artifact in a single resource.

The artefact does not have to have a stable URI, just a URI, but if the URI is not known then you can't link to the artifact. So I don't understand that objection.

I think the main question is: what is the harm in separating the role and the artifact? Are there advantages to doing so? Disadvantages? What are the implications of each choice?

As with all comments, we need to resolve this. We can't ask the group to make a decision without presenting them with choices that are clear. I was hoping you could write a short "votable" explanation of the issue(s).

rob-metalinkage commented 5 years ago

whether DCAT has roles is not the issue (though if Distributions are no information-equivalent this does implicitly suggest a need to explain the role).

the role and the artifact location are separated - it would be quite possible (and useful) to declare the title of a document that provides guidance even if a URI location is not currently known. This is less useful for machine readable artifacts obviously.

so the explanation of the issue is that an artifact may be referenced by a Profile using a ProfileResource relation and statements about its role made independently of other information about the artifact. The profiles vocabulary allows information about the artifact (such as its format, available profiles) to be declared as properties of the ProfileResource, instead of the artefact itself, in the same way DCAT allows properties of a Distribution to be declared independently of it access URL.

Options are: 1) do nothing, making cataloguing of profiles using DCAT simple, and making Linked Data implementation simpler, as we dont have to resolve artefact URIs to find these properties in general. 2) declare a new class of things "Artefact" that has these properties and force them to be present and declare the ProfileResource 3) define a set of entailment rules - so that these properties may be declared on an artifact or a ProfileResource

having a proxy for the actual artifact in a namespace you control (i.e. a Distribution or a ProfileResource) makes Linked data easier to realise IMHO - we dont have to access artefacts to get properties to see if we wanted to access them in the first place, and we dont force the publisher of all artifacts to support content negotiation and canoncial metadata views.. we can talk about existing artefacts.

option 3 might be a compromise here - implementers can decide if they want to force servers and clients to include selected metadata about each artifact in graphs.

kcoyle commented 5 years ago

Rob, I think we are talking past each other, but am not sure I can fix that. Can you go back to the examples I gave above and answer the questions I asked there? They have to do with artifacts with multiple roles. That's followed by another example. That's a very practical question which might help. Again, I suggest working with examples.

Again, puzzling: " we dont have to resolve artefact URIs to find these properties in general." There is no need to resolve the URIs in the model I give any more than in the current one. It's just another node in a graph. It's the fairly minor difference between descriptor1 and descriptor It's a question of associating the information about the artifact (which is a file on the web) with the artifact. If there is no artifact, those properties would not exist. I also think that this illustrates the idea of the "record" (upper picture) with that of graphs (lower). Records act as proxies while graphs describe things and relationships. In the second diagram, the resource descriptor describes the content of the resource and its role (also title, etc.), while the artifact graph describes the physical artifact. The content and the artifact are different things and each has properties that describe it.

Also: "declare a new class of things "Artefact" " Classes are not required in RDF and are not required in the examples I gave. You can have a graph that is not defined by a class. Remember, in RDF properties define classes, not classes define properties. Although the DCAT diagrams show boxes with classes and properties, a large number of the properties are in reality not associated with the class (they have no domain declaration to that class). The only thing that will associate those properties to that "thing" is a validation language. In RDF classes are semantic rather than structural.

rob-metalinkage commented 5 years ago

yep you can just build a graph...

but if you want to axiomitise the shape of that graph - for example require that the format of each artifact is declared, you need a subject of the axiom. RDFS Class, SHACL NodeShape with a targetClass, etc.

you also need to force servers delivering Profiles to include the full graph shape - which is perhaps not too bad as it doesnt make sense to deliver just the Profile without the ProfileResource

again we come to issues around implementation patterns that seem to be mirrored in DCAT design, and I would like a solution that is not difficult to align to the use case of cataloguing profiles with DCAT.

So my proposal is twofold:

"1) refer the issue of metadata about artifacts and implied graph shapes to the DCAT sub-group for comment about the nature of Distribution w.r.t. to the distribution artifacts and implications for cataloguing Profiles as a a sub-type of dcat:Resource.

2) create a clean issue about this and raise it in the next PWD to invite wider feedback."

kcoyle commented 5 years ago

Rob, can you respond to the question above, Please? Thanks.

tombaker commented 5 years ago

I understand the second of Karen's two diagrams above but not the first. I think we addressed this point on the list. Is the first diagram an accurate representation of the model in the current PWD?

but if you want to axiomitise the shape of that graph - for example require that the format of each artifact is declared, you need a subject of the axiom. RDFS Class, SHACL NodeShape with a targetClass, etc.

@rob-metalinkage Are you saying that a graph node must have an explicit or inferred rdf:type statement to be the focus of a shape? In ShEx it would not. (I'm not sure about SHACL.) But if artifacts are important, couldn't a class be coined?

This thread is quite long so my comments are based just on the most recent posts; apologies if I'm misreading the issue.

rob-metalinkage commented 5 years ago

@tombaker - yes thats what i am saying - we could introduce a prof:Artifact class. constraints that apply to a set of things are on the class - SHACL which i'm more familiar allows you to declare a shape on an individual, but thats not relevant as we are talking about axiomitising constraints on cardinality of a property for the set of instances.

The issue here is more that DCAT has chosen a pattern where properties of a proxy for an artifact (i.e. a dcat:Distribution) are declared to relate to that artifact, and since cataloguing of profiles is an important use case we have followed that pattern. so the question is now we have something to lose by not following the DCAT approach, do we have anything significant to gain by adding an Artifact class? ( given as @kcoyle says earlier "The assumption would be that if the artifact needs to be described, that would take place elsewhere in the "universal graph" - i.e. by not including an Artifact class we dont lose anything, but by having the explicit statement that the same properties of a ProfileResource that DCAT uses for a Distribution have the same meaning, we allow cataloguing using prof without further entailment. )

This why i am now leaning towards making it normative that Profile is a subclass of dcat:Resource ( now this abstract class exists in DCAT), and ProfileResource a subclass of dcat:Distribution. This is currently in a separate alignment ontology which would need to be updated if we move away from dcat pattern).

@kcoyle the answer to the question is yes - they are both valid and neither is preferred - having separate ProfileResource would allow you to attach comments and annotations specific to the role, otherwise they are the same - maybe more interesting is whether role can be missing at all.

kcoyle commented 5 years ago

@rob-metalinkage Thanks for answering my question. So now I have another example, kind of the opposite of the previous one. In this case I have two artifacts that have the same role. An easy example would be a SHACL document and a ShEx document, both of which have a role of "validation".

A:

<https://example.com/prof1>
    a prof:Profile ;
    prof:hasResource :dcat-ap-guidance-pdf1 ;
    prof:hasResource :dcat-ap-guidance-pdf2 ;

:dcat-ap-guidance-pdf1
    a prof:ResourceDescriptor;
    rdfs:label "ShEx Json" ;
    prof:hasRole roles:validation ;
    dct:format <https://www.iana.org/assignments/media-types/application/json> ;
    prof:hasArtifact <https://example.Shex/myValidation.js> .

:dcat-ap-guidance-pdf2
    a prof:ResourceDescriptor;
    rdfs:label "SHACL" ;
    prof:hasRole roles:validation ;
    dct:format <https://www.iana.org/assignments/media-types/text/turtle> ;
    prof:hasArtifact
        <http://example.com/mySHACL.ttl> .

In this case I don't think that there is a way to have a single resource descriptor for the role because the two artifacts are different formats. While repeating the resource descriptor is not a big deal, this does point out that the resource descriptor graph as it is now has a dependency on the file format of the artifact. Then again, one could leave off dct:format and let people use the file name extensions to discover the roles. Then you could have:

:dcat-ap-guidance-pdf1
    a prof:ResourceDescriptor;
    rdfs:label "ShEx Json" ;
    prof:hasRole roles:validation ;
    prof:hasArtifact <https://example.Shex/myValidation.js> ;
    prof:hasArtifact <http://example.com/mySHACL.ttl> .

In fact, that's what we've being saying about the use of dct:format - it's on the wrong graph, so removing it would be more correct. Anyone could add information about the artifacts because they are URIs and can be the subjects of any triple.

As for leaving off roles, there is nothing in the vocabulary itself that would prevent that. It would not be possible if the roles were defined as properties linking the profile and the resource although it would not be difficult (and was suggested in one comment) to have a generic role for those odd cases where one doesn't want to be more specific. But as defined there could be a resource descriptor with an artifact but no role, or a role but no artifact, or neither. If any of those would be considered invalid, a validation implementation file would be needed for profiles.

The main point is that if you want consistency then it would be good to make some modeling decisions that go beyond the RDF, since RDF does not constrain usage. That could be a short primer, I suppose.

rob-metalinkage commented 5 years ago

Again, i think this is all implementation...

if the resources are functionally equivalent and have the same role, then i dont see why you cant just list both formats - it really depends if you want to share metadata for the different forms or have different annotations. In the case of a resource that supports multiple formats and content-negotiation (the preferred, but not enforced approach) then you just list multiple formats. Otherwise you could have seperate descriptors as you suggest, or have multiple values of hasArtefact and extend the graph by defining the format of each artefact.

This all relates to the general problem of data - the downloadURL is expected to be more ephemeral than the conceptual resource, and we are assuming a need to define metadata about disposition of profile resources independently of current points of access. Maybe the killer reasoning here is that an artefact may be available from multiple access points, and depending on who you are you may have access to different ones, or they may provide value-added services.

kcoyle commented 5 years ago

@rob-metalinkage I'm sorry but your answer does not make sense. If you wish to list formats without associating them with files then you cannot use dct:format as that violates the meaning of that property. If this is your approach then you must define a different property. Maybe something like "formatsAvailable". dct:format must not be used in the way you suggest, regardless of how you might be able to work around it in an implementation.

There is possibly another solution, which is to use rdf:about with the object being the URI of the artifact. I haven't fully thought through that but it would reify the URI of the resource to the URI of the artifact.

makxdekkers commented 5 years ago

Indeed, if you include a list of formats and a list of artifacts, it would be impossible to know which file has which format.

tombaker commented 5 years ago

@makxdekkers @kcoyle are IMO correct. Unless I'm missing something, this is like a discussion we had in the DC community twenty years ago.

nicholascar commented 5 years ago

I'm removing the "feedback" tag as this is a purely DXWG issue.

kcoyle commented 5 years ago

@nicholascar Steven Richard is not in DXWG, and it may be his comments that got this labeled as feedback. I'm not sure it greatly matters, because all commenters need to be listened to in terms of conclusions. I think the "formal status" of folks who engage in github issues in W3C hasn't really been clarified, nor exactly how we should deal with them. If we get a chance we could ask Philippe.