solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
482 stars 44 forks source link

List recommended RDF serializations #465

Open rubensworks opened 1 year ago

rubensworks commented 1 year ago

This issue builds upon the serialization format goals and strategy discussion #454, and aims to determine a list of RDF serializations that are considered "recommended" across all Solid specs, to avoid serialization conflicts across specs (as seen in #463).

The goal of this issue is to discuss whether or not such a list makes sense, and which RDF serializations should be contained in this list. In contrast to #454, this is the place for discussing preferences on which RDF serializations we want (and don't want) to recommend.

rubensworks commented 1 year ago

My personal view on this is that the list should at least include Turtle and JSON-LD, since these are already often used within the Solid ecosystem.

I would even go a step further, and suggest that all W3C-recommended RDF serializations should be included in this list. Concretely:

This broader list is important, for example for static file servers. Static file servers may not be able to do content negotiation, so they should be able to provide content in a single format (e.g. RDFa or JSON-LD snippets in HTML, for file servers that only serve HTML, but also want to include RDF).

tomhgmns commented 1 year ago

I agree.

As a side note and just for your information, in #463, the issue is actually the opposite. Having a requirement like this would impose breaking support for traditional OIDC because traditional OIDC does not require the use of an Accept header even though it requires JSON. Because of this issue, the current version of Solid OIDC requires a client_id document to be served as JSON-LD (and not Turtle etc).

rubensworks commented 1 year ago

Having a requirement like this would impose breaking support for traditional OIDC because traditional OIDC does not require the use of an Accept header even though it requires JSON. Because of this issue, the current version of Solid OIDC requires a client_id document to be served as JSON-LD (and not Turtle etc).

As long as this list is imposed on clients to support parsing, while servers may expose their data in one of the recommended serializations, then I think this would still work.

So concretely, servers MUST support at least one recommended RDF serializations, and clients MUST support parsing of all recommended RDF serializations.

elf-pavlik commented 1 year ago

So concretely, servers MUST support at least one recommended RDF serialization, and clients MUST support parsing of all recommended RDF serialization.

I think we can't push the burden on clients. Especially PWA/SPA clients who need to keep their bundles small. I would really need to see someone demonstrate ES library which can use dynamic imports to load parsers as required.

This broader list is important, for example for static file servers. Static file servers may not be able to do content negotiation, so they should be able to provide content in a single format (e.g. RDFa or JSON-LD snippets in HTML, for file servers that only serve HTML, but also want to include RDF).

Broader Linked Data already allows all of this. I see a solid focus on read-write LD with authn/authz, if the server can support all the needed features they most likely can support content negotiation as well.

rubensworks commented 1 year ago

I think we can't push the burden on clients. Especially PWA/SPA clients who need to keep their bundles small. I would really need to see someone demonstrate ES library which can use dynamic imports to load parsers as required.

In the short-term, this may indeed increase bundle size (even though parsers can be very small), or require dynamic imports.

But in the long-term, I see (Solid-specific?) browsers shipping with these parsers by default. So that would then be similar to browsers supporting many different image formats.

Broader Linked Data already allows all of this. I see a solid focus on read-write LD with authn/authz, if the server can support all the needed features they most likely can support content negotiation as well.

Not all actors in the Solid ecosystem should necessarily be able to perform full read/write IMO. For example, the WebID profile document could remain read-only, which means that it can be hosted on static file servers, which do not support content negotiation.

woutermont commented 1 year ago

@rubensworks, as you already mentioned in, the long discussion in https://github.com/w3c/WebID/issues/3 can already give us a lot of insight in people's preferences/goals having to do with serialisations. I try to summarise some of the key points.

Finally, let me add a single recommendation of my own, concerning the Accept headers also discussed in https://github.com/solid/specification/issues/463. Currently there are a number of specs that mandate a specific serialisation to be served by default (i.e. on requests without Accept header), without having a good reason to demand so. This results in specs with different defaults being unusable together. To improve spec orthogonality, I would therefore propose that specs should not limit the default serialisation unless really needed (e.g. Solid-OIDC mandating JSON-LD with context to be OIDC compliant). I believe this is what @acoburn was originally advocating for to get the WebID and Solid-OIDC specs to work together.


I linked to the original posts everywhere I could, but should someone feel I misrepresented their opinion, I’ll gladly edit this post.

woutermont commented 1 year ago

Summarizing the gathered recommendations, specs should:

  1. guarantee interoperabiliy between all players (i.e. mandate at least one serialization for clients/servers);
  2. remain agnostic about RDF serializations (i.e. not mandate a specific serialization for clients/servers);
  3. not restrict other specs that build on them (i.e. not mandate a specific serialization for clients/servers);
  4. impose as little as possible on the clients (i.e. mandate at most one specific serialization for clients);
  5. allow servers that are incapable of content negotiation (i.e. not mandate a specific serialization for servers);
  6. take into account the increasing importance of HTML-based serialisations;
  7. not limit the default serialization unless really needed.

If we then try to put this in a spec for Solid specs, we could for example start with recommendation [7], and translate it to the following, which also adheres to recommendations [2], [3] and [5]:

A Solid spec should contain a MUST mandating servers to serve ANY RDF-serialisation on default requests (i.e. without ConNeg).

This alone obviously places a heavy burden on the clients and/or makes interoperability a complex issue. To alleviate this, we could add the following, which goes slightly against [2], but additionally adheres to [1], [4] and [6].

A Solid spec should contain a MUST mandating servers to serve (X)HTML+RDFa on requests with text/html or application/xhtml+xml as Accept header.

Additionally, we could add arbitrary recommendations of the following form. Based on [2], that should either by for ALL or for NONE of the W3C-recommended RDF serializations.

A Solid spec should contain a SHOULD mandating servers to serve X on requests with Y as Accept header.

Note that we could transform these to MUSTs, but only by letting recommendation [5] fall.

Taking them together, I think these three rules could be a good foundation for Solid specs to build on, taking into account all recommendations that I found in the the referenced issues.

elf-pavlik commented 1 year ago

Considering this, it is interesting to notice @TallTed’s https://github.com/w3c/WebID/issues/3#issuecomment-1042117077 pointing out the increasing importance of HTML-based serializations, as well as the following https://github.com/w3c/WebID/issues/3#issuecomment-1042990246 of @kidehen.

I see in the first linked comment:

(It's also worth noting that there is a growing tide of dual-purpose human-and-machine targeted HTML-based profile and related documents, with Turtle and/or JSON-LD data islands in <script ... /> entities, which MUST be preserved as HTML, and which RDF data islands SHOULD be supported and used for emerging and evolving protocols like WebID-TLS.)

So it talks about RDF embedded in HTML using <script>tags, not RDFa


A Solid spec should contain a MUST mandating servers to serve (X)HTML+RDFa on requests with text/html or application/xhtml+xml as Accept header.

I think issues with text/html used as RDFSource has been discussed in various issues. IMO it brings more problems than it solves.

woutermont commented 1 year ago

@elf-pavlik, for myself HTML is not a major concern. I just mentioned it since multiple people seem to find it important (including either RDFa or structured data islands).

As I already mentioned, without that concern all other recommendations can be captured by:

A Solid spec should contain

  • a MUST mandating servers to serve ANY RDF-serialisation on default requests (i.e. without ConNeg);
  • a MUST mandating servers to serve all W3C-recommended serializations on requests with the relevant Accept header.

PS: Could you link to the issues you refer to? Thanks!

elf-pavlik commented 1 year ago

I see bits of pieces are all over:

It could be nice to create one document from all of them that could serve as a reference.

csarven commented 1 year ago

Pavlik, for the umpteenth time, stop mischaracterising things.

RDFa is a concrete RDF syntax as per W3C. You don't have to accept that reality or even like it. We, the people, do use RDFa in Solid. RDFa checks many considerations that other formats do not and cannot.

You don't want to use or like RDFa? No problem. Stop turning every discussion that touches HTML or RDFa into why no one else should use it in Solid. Consider understanding what diversity or an ecosystem entails.

People have invested a lot of time and given you plethora of explanations and links for you to study over the years. Consider developing or authoring something to build up some experience.


Required RDF serialization of WebID resource w3c/WebID#3 (comment) (which you mentioned yourself)

Is specifically about WebID Profile Documents, and not a general point about RDFa, and Tim is referring to using HTTP PATCH to update using a specific application and its choices. PATCH is literally not the only way to update a document. Have a look at PUT.

Aligning representations of document and container resources with REST via single and compound state #198 (comment)

I've responded to you there (and elsewhere) but it is disappointing that you are not even processing what's being said.

Content of Turtle and RDFa documents should be wholly and entirely preserved #342

You are sharing this but I don't think you understand the discussion.

PUT RDFa, then GET turtle #195 Require server ability to convert RDFa -> JSON-LD and RDFa -> Turtle? #243

It was already clarified in these issue that this conversion (or any other format in fact) is not particularly important for the spec in that servers can accept any concrete RDF syntax (or even any equivalent representation in fact) but that they have to provide Turtle or JSON-LD when requested.

Specify POST #108 (comment)

Again, you are reaching/cherry picking. See the context of the discussion.

csarven commented 1 year ago

The goal of this issue is to discuss whether or not such a list makes sense

No need. Each technical report will require the formats it needs.

Solid fundamentally recommends Linked Data. RDF as the language. Specific formats are asked for in each technical report with focus on interoperability. Different classes of products are welcome to use anything else in addition to that because technical reports alone do not address every use case in the ecosystem.

The system needs to be evolvable and so it is best not to draw hard lines with lists as such which may not even matter in the end, and all things considered.

rubensworks commented 1 year ago

@woutermont A Solid spec should contain a MUST mandating servers to serve (X)HTML+RDFa on requests with text/html or application/xhtml+xml as Accept header.

This may be a bit too strict. I guess a MAY is sufficient here. The MUST should probably be placed on the client in its ability to parse HTML (which most clients already can, since we're running mostly in the browser).

@elf-pavlik I think issues with text/html used as RDFSource has been discussed in various issues. IMO it brings more problems than it solves.

It introduces problems indeed. But because of schema.org, including RDF in HTML is one of the most popular ways of representing RDF data, so I think we have to accept that reality, otherwise we're going to lose a lot of people.

@csarven Specific formats are asked for in each technical report with focus on interoperability.

This approach is currently causing issues, as raised by @tomhgmns in #463.

@csarven The system needs to be evolvable and so it is best not to draw hard lines with lists as such which may not even matter in the end, and all things considered.

Fully agree that this should be evolvable. However, to ensure interoperability between server and client, developers must have some guarantees on what they must or should implement, hence the suggestion of a list. To keep things evolvable (e.g., when a new serialization is proposed, or another one falls out of favour), this list could be evolvable as well.

woutermont commented 1 year ago

@woutermont A Solid spec should contain a MUST mandating servers to serve (X)HTML+RDFa on requests with text/html or application/xhtml+xml as Accept header.

This may be a bit too strict. I guess a MAY is sufficient here. The MUST should probably be placed on the client in its ability to parse HTML (which most clients already can, since we're running mostly in the browser).

@rubensworks, then there is no guaranteed serialisation, however, and clients would have to bundle all parsers (or import the right one) to be interoperable.

woutermont commented 1 year ago

@csarven Specific formats are asked for in each technical report with focus on interoperability.

This approach is currently causing issues, as raised by @tomhgmns in #463.

The core of #463 is not just specific formats. It is specific formats being required as default, i.e. to be served on requests without ConNeg. Having all requirements specify an Accept header (as in the first rule I proposed above), would already suffice to solve those incompatibilities.

rubensworks commented 1 year ago

@woutermont and clients would have to bundle all parsers (or import the right one) to be interoperable.

Indeed, which is acceptable IMO.

To quote my earlier comment:

In the short-term, this may indeed increase bundle size (even though parsers can be very small), or require dynamic imports.

But in the long-term, I see (Solid-specific?) browsers shipping with these parsers by default. So that would then be similar to browsers supporting many different image formats.

jacoscaz commented 1 year ago

@woutermont and clients would have to bundle all parsers (or import the right one) to be interoperable.

Indeed, which is acceptable IMO.

Hard disagree there as some formats are tremendously more difficult to work with than others (JSON-LD and RDFa vs. Turtle and RDF islands). But, all the time and energy spent on debating serialization formats also demonstrates that mandating any one given format lands us nowhere. IMHO, quoting @woutermont's quoting of my own writing:

[The spec] has a MUST using one or more RDF serialisation formats, [i.e.] [RDF data] must be available in one or more RDF serialisation formats, [and] has a SHOULD on a couple of suggested formats (best practices), making [T]urtle one of them for backward [compatibility].

To which I might add, perhaps another of the suggested formats should be RDF data islands in <script> tags using Turtle.

EDIT: to clarify, this comment mostly reflect my experience discussing the WebID spec. However, I do think such experience also applies here and that it'd be better to have the two specs as aligned as possible, within reason.

rubensworks commented 1 year ago

Hard disagree there as some formats are tremendously more difficult to work with than others (JSON-LD and RDFa vs. Turtle and RDF islands).

Indeed, parsing can sometimes be tricky. But I think we're losing track of that fact that parsers will be (and already are) available as reusable libraries. This means that Solid apps won't have to re-implement parsing support for every RDF serialization. Instead, they can just import an existing lib. AFAIK, all major programming languages have implementations of all recommended RDF serializations.

This is similar to handling different image formats on Web pages. Image format parsers already exist, and can easily be reused. If the same reasoning were to be applied to image formats, then there would be no room for innovation in this regard, and highly compressed image formats such as WebP could never be adopted.

jacoscaz commented 1 year ago

But I think we're losing track of that fact that parsers will be (and already are) available as reusable libraries.

They are, but my concern is more about general performance rather with the availability of parsing libraries. Both WebID and Solid should make as few assumptions as possible when it comes to the environment they are used in. Bundle size and CPU / memory utilization in desktop applications served over high-bandwidth networks can often tolerate 10x performance penalties with very little consequence for the end user, just as they can be trivially updated by publishing a new bundle. The same cannot be said of many other environments (low power boards, low-bandwidth networks, ...). Maintenance is also another issue, with the combination of all parsers leading to an explosion in the number of dependencies that poses a serious issue from a security standpoint (esp. when dealing with clearing processes in corporate environments).

In any case, as stated I am in favor of suggesting (as in SHOULD) a couple of preferred formats. That list could also be updated over time, as per your suggestion @rubensworks . But, those formats should be selected based on how easy they are to use across the spectrum of environments that WebID and Solid might reasonably be used in. Hence my choice of Turtle and Turtle data islands.

I'll stop beating on this particular drum as I've made this point here and elsewhere cited above; I don't want to pollute the conversation.

woutermont commented 1 year ago

Thanks for your elaboration, @jacoscaz!

@ruben, could you maybe explain in more detail WHY you think it too strict to mandate servers to serve a concrete serialisation (e.g. (X)HTML+RDFa)?

@woutermont A Solid spec should contain a MUST mandating servers to serve (X)HTML+RDFa on requests with text/html or application/xhtml+xml as Accept header.

This may be a bit too strict. I guess a MAY is sufficient here. The MUST should probably be placed on the client in its ability to parse HTML (which most clients already can, since we're running mostly in the browser).

rubensworks commented 1 year ago

@ruben, could you maybe explain in more detail WHY you think it too strict to mandate servers to serve a concrete serialisation (e.g. (X)HTML+RDFa)?

I think it's important that Solid enables (parts of) pods to be hosted on static file servers, which usually don't have the ability to perform content negotiation, and they can thereby only serve a single format. Many websites today use static file servers, and I don't think this will be going away soon (my own website is static as well, so I definitely see its value).

For instance, if servers are allowed to only serve a single RDF serialization that they can choose, it's possible to host a (read-only) Solid pods on platforms such as GitHub Pages, which is (IMO) a use case we would want to enable.

So if a server stores RDFa in HTML, and can only serve that, this should be fine. Also if a server stored Turtle files, and can only serve that, this should be fine as well. Imposing more on servers would make many hosting options fall out of the boat.

bblfish commented 1 year ago

I think it's important that Solid enables (parts of) pods to be hosted on static file servers, which usually don't have the ability to perform content negotiation,

Solid is a project to extend the web with access control, content negotiation, etc. in order to create very much needed decentralised social networks.

So looking at what servers that have not embarked on that project are doing to guide us, when that means making it more difficult to get solid going because we then create huge technical requirements, is putting the cart before the horse.

Turtle is a key format there: it is simple to understand and parse.

woutermont commented 1 year ago

I agree with @bblfish in as far that we should not let existing practices be the ultimate guide for our decisions. However, I definitely see value in trying our best to let them remain compatible for as long as possible.

@rubensworks, I get what your point is, as it has been raised/echoed by a number of other people as well. I do think, however, that my proposal (i.e. a MUST for serving (X)HTML+RDFa) provides sufficient room for static file servers without content negotiation, while still guaranteeing a sturdy foundation of interoperability. A primary group of statically served RDF resources will be (X)HTML+RDFa, served as text/html or application/xhtml+xml, the type of which often cannot be changed because of limits to the hosting (e.g. in CMS's). Other statically served RDF resources (e.g. Turtle files, JSON-LD files) can either be served as such on requests accepting */*, or can be trivially transformed into (X)HTML+RDFa before putting it on the static file server.

Am I missing something that makes you still prefer a MAY, with the consequences for interoperability?

EDIT: Simply serving existing files would indeed not be possible anymore, which could be a good compromise, as I argue further.

namedgraph commented 1 year ago

Are you familiar with the WWW architecture principle of orthogonal specifications? RDF syntaxes, among other things, are clearly orthogonal to the Solid's core specification. So why are you trying so hard to include all of it under Solid?

rubensworks commented 1 year ago

@woutermont Am I missing something that makes you still prefer a MAY, with the consequences for interoperability?

Because a MUST on HTML conflicts with the possibility to statically serve other formats, such as Turtle.

@namedgraph Are you familiar with the WWW architecture principle of orthogonal specifications? RDF syntaxes, among other things, are clearly orthogonal to the Solid's core specification. So why are you trying so hard to include all of it under Solid?

I follow this. However, to be able to give some guidance and guarantees to developers of Solid software, maintaining a list of recommended serializations would be very helpful.

woutermont commented 1 year ago

@namedgraph

Are you familiar with the WWW architecture principle of orthogonal specifications? RDF syntaxes, among other things, are clearly orthogonal to the Solid's core specification. So why are you trying so hard to include all of it under Solid?

Yes, I am, and I have taken that into consideration. Thanks for linking to it for those who don't.

As I suggested, amongst others based on your own comments in https://github.com/w3c/WebID/issues/3, we should not prefer one serialisation over another just for preferences' sake: the third proposed rule should probably be included for ALL or NONE of the serialisations, since it is indeed not up to Solid to decide what syntax is to be prefered.

However, it IS Solid's concern that servers and clients in its ecosystem should have a minimum of guaranteed interoperability, i.e. at LEAST one serialisation should be mandated. It is ALSO a concern of Solid that a number of practices (in casu statically served files) remain compatible, and thus at MOST one serialisation should be mandated. It is because of these two reasons, that I think a MUST regarding (X)HTML+RDFa could be considered (EVEN if I would personally rather mandate ALL serialisations).

woutermont commented 1 year ago

@rubensworks

@woutermont Am I missing something that makes you still prefer a MAY, with the consequences for interoperability?

Because a MUST on HTML conflicts with the possibility to statically serve other formats, such as Turtle.

That is true. I think that it is a fair compromise between interoperability and compatibility. RDFa is quite readable (even if a bit verbose), and transforming Turtle to it is trivial, so using RDFa as a syntax when you want to store something statically does not seem like too heavy a burden.

jacoscaz commented 1 year ago

@woutermont even though I still favor having zero mandated (as in MUST) serialization formats, what you’re proposing would be my second favorite option for WebID, too, and even more so if the format were to include both RDFa and Turtle data islands. I still think data islands alone would be better but I do realize that there is a vast amount of RDFa material already out there.

woutermont commented 1 year ago

@jacoscaz thanks for the support

But then it seems that you, as well as @rubensworks and probably @namedgraph would all prefer to mandate nothing to the server and put the burden completely with the client? If that is the case, we might want to evolve in that direction, unless someone fervently wants to defend the clientside here (@elf-pavlik @bblfish @jonassmedegaard @csarven @timbl ?).

If we take that other route, reconsidering the recommendations in my original comment, we could adhere to [2], [3], [5] and [7], maybe to [1] and [6], but not to [4], with the following.

A Solid spec should contain

  • (a MUST mandating clients to understand ALL W3C-recommended RDF-serialisations)
  • a MUST mandating servers to serve ANY W3C-recommended RDF-serialisation on default requests (i.e. without an Accept header).
  • a [MAY allowing / SHOULD mandating] servers to serve all W3C-recommended RDF-serializations on requests with the relevant Accept header.
acoburn commented 1 year ago

Indeed, parsing can sometimes be tricky. But I think we're losing track of that fact that parsers will be (and already are) available as reusable libraries.

This is true for some languages and contexts: JavaScript, Java and Python are good examples. This is not universally true, though, and the devil is in the details.

For example, say you are writing an iOS application in Swift. There is no RDF library in Swift or Objective-C; there is no viable RDF library in C/C++. One could proxy out to a python library, but this gets quite complicated. For Android, which doesn't support newer Java11 features, Jena doesn't work at all; RDF4J partially works, but that timeframe is likely limited, as the main development of RDF4J has already moved to Java11.

This gets even more complicated with constrained, embedded devices, which have, in some cases, very limited CPU/Memory resources. Supporting all possible RDF serializations in such a client is, in many cases, simply not possible.

Secondly, Linked Data applications will likely interact with other, non-Solid services. Those services (e.g. Linked Data Fragments, Verifiable Credentials, Web Of Things) may have their own requirements on serializations -- and those requirements exist for a reason. Requiring that all clients can interact with all Linked Data services using all possible serializations is an even higher bar to set, especially if a particular client may only need to interact with a known subset of these services.

IOW, defining a single canonical list of serialized forms is both a very high bar to set for clients and will also be incomplete once you look at the wider Linked Data ecosystem. I would encourage keeping specifications orthogonal, which to me means not mandating a specific list of serialized forms for all of Solid.

woutermont commented 1 year ago

@acoburn, thanks pointing out the contextual nature of parser support.

However, if we cannot place the burden with the clients, me must place it on the servers, or else abandon the idea of full interoperability. While for a client a canonical list is hard, it shouldn't be for servers, so having those deliver RDF in all W3C-recommended serialisations seems perfectly acceptable to me.

@rubensworks, in that light I would like to reconsider the importance of servers without content negotiation. Could you elaborate more on why a SOLID recommendation of syntaxes should affect it? I presume it is not your aim to have static file servers be fully compliant Solid servers? If not, then where's the connection with this issue? Without being constrained by Solid specs, those servers can still serve Turtle, JSON-LD and/or (X)HTML+RDFa. The only relevant constraint I see is then in what format you should host your WebID Profile Document. Or again, am I missing something?

elf-pavlik commented 1 year ago

@rubensworks :I think it's important that Solid enables (parts of) pods to be hosted on static file servers, which usually don't have the ability to perform content negotiation,

@bblfish: Solid is a project to extend the web with access control, content negotiation, etc. in order to create very much needed decentralised social networks.

@rubensworks For instance, if servers are allowed to only serve a single RDF serialization that they can choose, it's possible to host a (read-only) Solid pods on platforms such as GitHub Pages, which is (IMO) a use case we would want to enable.

I don't see clearly where you draw the distinction between broader Linked Data and Solid. I see also see solid adding access control and read-write over HTTP. GitHub Pages seem to me only fit for general Linked Data.

The question for me might go more towards How Solid fits into the broader Linked Data ecosystem, for example, how Solid applications can work with data which is not published on solid storage.

When it comes to WebID, I believe there is no assumption that it is hosted on solid storage.

@acoburn: Secondly, Linked Data applications will likely interact with other, non-Solid services. Those services (e.g. Linked Data Fragments, Verifiable Credentials, Web Of Things) may have their own requirements on serializations -- and those requirements exist for a reason.

jacoscaz commented 1 year ago

Let's focus on the things we can't achieve yet, and settle on a small set of syntaxes for now.

Whereas I agree with the sentiment, I would argue that a small but active ecosystem is a much easier beast to tame than a big and established one. Unification efforts at the current scale are still within the capabilities of a relatively small group of devs.

IMHO, and the H there is a very pronounced one, if something needs to be mandated, then mandating JSON-LD instead of a human-friendly format to complement Turtle is not ideal.

elf-pavlik commented 1 year ago

@woutermont even though I still favor having zero mandated (as in MUST) serialization formats, what you’re proposing would be my second favorite option for WebID, too, and even more so if the format were to include both RDFa and Turtle data islands. I still think data islands alone would be better but I do realize that there is a vast amount of RDFa material already out there.

I think a dedicated issue to discuss text/html might be helpful, with links to a plentitude of prior conversations. I see many advantages to data islands. Given some basic requirements, like just one <script> tag with the whole graph, and possibly narrowing it here to either JSON-LD or Turtle. Most data-driven clients could still manage read-write by modifying that single <script> and, in contrast to RDFa, without having to worry about impacting the human visible content.

Again, unless we want to dedicate this whole issue to continuing text/html theme it might be worth creating a new issue.

woutermont commented 1 year ago

@RubenVerborgh, I'm not against starting small and re-evaluating based on use cases, but I had the impression that we were here to gather intentions and goals to formulate a long-term strategy, as kickstarted by yourself in #454 🤔

In particular, I ended up in that issue because of the concrete and urgent issue for which use cases are raised in #463, and which has also raised by @acoburn in https://github.com/w3c/WebID/issues/3, before (just as now) it got stranded in "the bigger picture".

bblfish commented 1 year ago

Here's my suggestion:

All Solid protocol related formats MUST use Turtle (which can be thought of as a subset of Trig or N3, which may be needed at some point later).

Of course a Solid server can accept and publish every other format in existence, including binary rdf, rdfa, binary formats like Parquet, Avro, HTML, CSV, XML, JSON, any of the last marked up with GRDDLE like tech to view any format as RDF, ... and also of course visual and audio media types JPGs, ogg, video streams, etc... etc...

elf-pavlik commented 1 year ago

When it comes to serializations, I think we could look more into using dataset formats, at least on GET. In https://github.com/solid/specification/issues/291#issuecomment-1264115106 I suggested that this could help to keep auxiliary resources nicely separated (eg. client and server-managed statements) while allowing combining them in a single response which would include multiple named graphs.

I realize that doing writes with dataset formats is more tricky but taking advantage of them for reading shouldn't be too hard. Since JSON-LD already supports datasets, I would imagine Turtle/Trig and JSON-LD on GET as a nice step in that direction.

bblfish commented 1 year ago

I had the impression that we were here to gather intentions and goals to formulate a long-term strategy

We are 🙂 And my input for the long-term strategy as it pertains to this issue of creating a serialization list: Turtle and JSON-LD (as already spec-mandated), until we have actual use cases we cannot easily address otherwise.

I agree with that pragmatic approach.

I think we should favor Turtle on the whole. Json-LD makes sense for the OpenId-Connect use case as there is an interoperability requirement with another ecosystem. JSON-LD really is designed for that interoperability scenario, and it does it very well.

But I am a bit concerned that we probably don't yet have many scalable Json-LD parsers. In a recent PR for IO support to banana-rdf I had to hack around with the Titanium parser that Jena uses and I found the following problems:

  1. Titanium uses it's own RDF classes requiring every triple to be translated to Jena
  2. it runs it's own threads to fetch a context and cache those.
  3. Using java.io means that it is also blocking, a problem it has it common with most other RDF parsers in Java (until project Loom arrives to the JDK - perhaps that will magically fix all the existing ones.)

If I had time and money I would love to fix that. But it's probably a 2 month project. Actually if I had money and I could find someone who wanted to get going in this space I'd tutor them :-)

elf-pavlik commented 1 year ago

☝️ I created a dedicated issue for RDF in text/html #468