w3c / EasierRDF

Making RDF easy enough for most developers
267 stars 13 forks source link

Scope & intention? #46

Closed ktk closed 5 years ago

ktk commented 5 years ago

I am are relatively confused by the workshop next year, by the mission statement and by the issues in this repository. When I first read about the workshop my first reaction was that PG vendors start to be scared because they finally realized that RDF is a standard while their stuff is mostly proprietary (from a data model perspective & query languages).

Later the thread popped up in the mailing list followed by this repo. This did not really help, on the entry page you mention stuff like

Backward compatibility is highly desirable, but less important than ease of use.

The goal is to make RDF -- or some RDF-based successor -- ...

and

...easy enough for average developers (middle 33%), ...

Followed by issues that rant about everything from SPARQL queries that do not perform (#39 ) to why blank nodes are evil (#19 )and how we can somehow link property graphs with RDF (or whatever issue #45 is about).

As someone who works with RDF for a while (discovered it 10 years ago), does not come from research (Engineer and proud of it), never wrote any (useful) paper about RDF, didn't had to do a PhD around it but - probably unlike many others in the RDF domain - actually pay my and other salaries by consulting real world customers in RDF and RDF related projects, I am utterly confused.

Yes we have a lot of problems to solve an in fact we spend a lot of time doing so in our company and publish those results as open source software. See for example a big part of RDFJS and many other things in our Github org.

But honestly most of that stuff is related to lack of tooling and not fundamental problems in or with RDF itself. And after waiting for tooling to happen I'm convinced that the only way forward is more collaboration in actually creating maintain and evolve (changed wording, see 1) useful stuff and not getting distracted by PG groups trying to become a standard as well so they can say even more BS about RDF to their (potential) customers.

I would love to address some limits I ran into in the past years. But again, those are 90% tooling related and maybe 10% stuff I miss in SPARQL. I do not need most of the discussions I saw in this issues so far for solving these. And I surely do not need an RDF successor for that.

My colleague told me I should be more positive so let me give some proposals about, in my opinion, useful things and discussions we can have:

Contribute to your favorite RDF stack

It's really depressing to see how few contributions we get in our JavaScript libraries. The RDFJS stack is written by a handful of people and even though once in a while a new person pops up, it's by far not an active community.

The same is true for other stacks, even for Java. TDB is still my first choice when I play around with RDF stores but without Andy, this project would be dead a long time ago. I guess it's not much different for Python/Ruby/etc implementations of RDF, it's mostly a 1 - <= 10 person show.

Talk about your real world use cases

There are really cool real world RDF projects out there. I know from our clients that we cannot talk about all of them in public but some of them should be easy stories to tell. We should have more conferences about real world use cases and at least not more about presenting RDF paper xyz. I plan to organize such a one-day event in Switzerland 2019, ping me if you are interested.

Work on SPARQL

As mentioned above, the place where I see most need right now is discussing the future of SPARQL. I guess we could have SPARQL 1.2 defined relatively fast with a bunch of stuff which is missing in the current spec but would be very useful in the real world. Some of them are implemented by various stores but not standardized so it's store-proprietary. For bigger stuff like the PATH concept implemented in Stardog it would make sense to think about SPARQL 2.0. There we would be allowed to break things IMHO.

Again, a lot of stuff would either be syntactic sugar or modest extensions of the spec.

Provide blue-prints for working with RDF

There are books about everything related to RDF but there is little to no documentation an up-to-date examples about how to use it. We try to change that a bit at Zazuko and I see others doing the same but we need more of this.

Just f-ing do it

I really love reading papers about cool ideas around RDF but I guess we would all be waaaaaaaay further if at least a part of the time invested in papers goes into actual development. And I mean contributions to existing code that is actually useful, not the next great thing that is dead in the water from day 1.

TL;DR

RDF is not the problem, (part of) the community is.

kidehen commented 5 years ago

The fact a clearly outlined use-case cannot be produced is this surprising to me. Can't we speak about data rather than nodes and vertices?

If you want to make a decision based on entity relationship types (relations) represented as a collection of sentences, you have to scope weights to the sentences rather than trying to modify the nature of an RDF predicate [1].

{

@prefix : <#> .

 [
       a rdf:Statement ;
          rdf:subject :i ;
          rdf:predicate :like ;
          rdf:object :RDF 
]
   :weight "1.00"^^xsd:decimal . 

 [
       a rdf:Statement ;
           rdf:subject :i ;
           rdf:predicate :like ;
           rdf:object :PropertyGraphConfusion 
]
   :weight "0.00"^^xsd:decimal . 

}
screen shot 2018-12-18 at 1 05 58 pm

.

  1. Reification Example using RDF-Turtle Doc

  2. Linked Data URI for Property Graphs Issue and RDF -- which is basically always about not linking the verbosity of statement reification in an era where performance is no longer the RDF-killer i.e., a fast DBMS will handle the verbosity as is already demonstrated across many live Virtuoso RDBMS instances (Uniprot, LOD Cloud Cache etc..).

dbooth-boston commented 5 years ago

@ktk, sorry about the confusion. I have updated the README to clarify, but I will also try to address some of your questions here.

I am are relatively confused by the workshop next year next year,

The effort to standardize a graph interchange framework and the effort to make RDF easier arose independently, but they are highly complementary because greater adoption of RDF will produce more resources for improving the RDF ecosystem, and making the RDF ecosystem easier will lead to greater adoption. That is why the graph interchange standardization effort is mentioned so prominently.

Followed by issues that rant about everything from SPARQL queries that do not perform (#39 ) to why blank nodes are evil (#19 )and how we can somehow link property graphs with RDF (or whatever issue #45 is about).

Yes, the issues list is a grab bag. Part of the challenge is to figure out if there are over-arching themes that could address multiple issues at once.

Yes we have a lot of problems to solve an in fact we spend a lot of time doing so in our company and publish those results as open source software.

Excellent! Thank you!

But honestly most of that stuff is related to lack of tooling and not fundamental problems in or with RDF itself.

Tools are fair game also! This effort is about the entire RDF ecosystem -- including tools, SPARQL, OWL, standards, educational materials, etc. -- not merely RDF per se. The term "RDF" is overloaded: sometimes it refers to the whole RDF ecosystem, and sometimes it refers to the RDF standard itself. Using the same term for both was probably a mistake -- on my part -- since it seems to be creating some confusion. If it seems to be a persistent problem I will see if I can reword things to address it.

And after waiting for tooling to happen I'm convinced that the only way forward is more collaboration in actually maintain and evolve . . . useful stuff and not getting distracted by PG groups trying to become a standard as well so they can say even more BS about RDF to their (potential) customers.

+1 to contributing more useful stuff! But I disagree with your assessment of property graph groups. I think RDF can and should learn from their successes, and I think RDF would only benefit from absorbing property graphs, by increasing the total pool of RDF developers.

I would love to address some limits I ran into in the past years. But again, those are 90% tooling related and maybe 10% stuff I miss in SPARQL.

Please share what they are! See tools issues and please create a new issue if appropriate.

I do not need most of the discussions I saw in this issues so far for solving these. And I surely do not need an RDF successor for that.

Okay, but: (a) you are already accustomed to RDF, for 10 years; and (b) you clearly are not a developer in the middle 33% of ability. :) We already know that more highly skilled developers like you can use RDF successfully -- that has been well proven over the past 20 years. This effort is about making it easy enough even for developers in the middle 33% of ability, who are new to RDF, to be consistently successful.

The contributions that you and other successful RDF developers make are vital, and got us this far, but we now need to figure out how to address the middle 33%. Some of the barriers definitely are tooling related, but not all. For example, see the posts from Chris Yocum and Steven Harms. Note that education -- learning about RDF, how to use it and where to find the right tools -- was a major component of the difficulties that they faced.

Contribute to your favorite RDF stack. It's really depressing to see how few contributions we get in our JavaScript libraries. . . .

+1 Any thoughts about how to encourage more contributions?

Talk about your real world use cases. . . . I plan to organize such a one-day event in Switzerland 2019

Excellent! In the USA there is also an upcoming U.S. Semantic Technologies Symposium.

Work on SPARQL. As mentioned above, the place where I see most need right now is discussing the future of SPARQL. I guess we could have SPARQL 1.2 defined relatively fast with a bunch of stuff which is missing in the current spec but would be very useful in the real world. Some of them are implemented by various stores but not standardized so it's store-proprietary.

That is useful input, and I have recorded it as #47. However, please bear in mind that those deficiencies are much more relevant to experienced SPARQL users than to newbie middle 33%-ers, who are far more likely to give up on RDF before they even get to SPARQL.

Provide blue-prints for working with RDF. There are books about everything related to RDF but there is little to no documentation an up-to-date examples about how to use it.

Agreed. This is #7. I think this is a major barrier that a common entry point (#6) could help to address.

We try to change that a bit at Zazuko and I see others doing the same but we need more of this.

Thank you!

Just f-ing do it. I really love reading papers about cool ideas around RDF but I guess we would all be waaaaaaaay further if at least a part of the time invested in papers goes into actual development. And I mean contributions to existing code that is actually useful, not the next great thing that is dead in the water from day 1.

I partially agree, but I think we need to recognize the academic and professional motivations that cause this to happen, and that will continue. I think most paper authors do implement the techniques they describe, but their code is too often written in isolation and then abandoned, instead of being incorporated into existing, active code bases. Given that they will continue to write papers, perhaps we can at least encourage them to get their code incorporated into active code bases.

RDF is not the problem, (part of) the community is.

I don't think disparaging (part of) the community is helpful.

This effort to lower the entry bar to RDF may not resonate much with you -- an accomplished RDF developer with 10 years of experience -- but if RDF is ever going to break out of its "niche" status this problem must be addressed.

Anyway, I hope I have helped to clarify the scope and intention of this effort. In short:

  1. The goal is to make RDF -- or some RDF-based successor -- easy enough for average developers (middle 33%), who are new to RDF, to be consistently successful.
  2. Solutions may involve anything in the RDF ecosystem: standards, tools, guidance, etc. All options are on the table.
  3. Backward compatibility is highly desirable, but less important than ease of use.
kidehen commented 5 years ago

@dbooth-boston,

The goal is to make RDF -- or some RDF-based successor -- easy enough for average developers (middle 33%), who are new to RDF, to be consistently successful.

Is RDF hard to use or poorly understood? In my experience, on a totally practical basis, I find it is always totally misunderstood. Most of the time, unawareness of existing productivity tools is the main problem.

Solutions may involve anything in the RDF ecosystem: standards, tools, guidance, etc. All options are on the table.

We have a LODCloud, but we don't have a segment of that cloud that aids serendipitous discovery of:

  1. Productivity Tools
  2. Developer Tools
  3. Training Collateral

Backward compatibility is highly desirable, but less important than ease of use.

If backward incompatibility is a possibility it shouldn't be called RDF at all. Why would you want to make incompatible variants of RDF?

Property Graphs have their own hype-axis, and said axis could survive conjuring up a new name. It doesn't need to ride RDF for that on an "embrace and extend" basis.

laurentlefort commented 5 years ago

@dbooth-boston ,

Same as @ktk here, my preference is to have the EasierRDF initiative and the Web Standardization for Graph Data workshop having different life spans for the following reasons:

Like @kidehen I'm wary of not succumbing to (fluctuating) hypes and am noting that we now have two concurrent ones - the one around graph data and the one around AI prompting increasing interest in the Knowledge Representation foundations of the semantic web.

The work you are doing here is calling the community to help fix stuff they use on a daily basis, and is very important. In my view, if you don't set the EasierRDF boundaries right (e.g. on backward compatibility issues), you will lose the support you are trying to get from the people you want to help.

W3C workshops have a different role to play: building bridges, bringing new faces to the fore and starting new things (as opposed to maintain them).

While I am at it, I may as well provide some feedback on the March workshop CFP https://www.w3.org/Data/events/data-ws-2019/cfp.html

My feeling is by putting all the hot topics in a single grab bag, W3C is killing the opportunity to build greater momentum. My feeling is that three different workshops could have been organised:

So, I would not mind having a stronger curation and moderation effort. This would require defining clearer boundaries of where this stops and where things done elsewhere start (and possibly let @draggett set up other spaces to let these other ideas to flourish).

My (personal) priority would be to fix things which have prevented tools developed for the Linked Data part of the community and the OWL part of the community to be easier to (simultaneously) use. I think this is probably one of the factors which has stopped us from having a LAMP tools bundle.

dbooth-boston commented 5 years ago

@laurentlefort writes:

my preference is to have the EasierRDF initiative and the Web Standardization for Graph Data workshop having different life spans

They definitely do have different lifespans. They are independent efforts that were separately conceived, and as you note they have different different goals. Sorry this was unclear. However, they have been mentioned together because both efforts can be beneficial to each other:

I have tried to improve the README to clarify the difference between these initiatives. Do you think it still needs further clarification? If so, how?

afs commented 5 years ago

@dbooth-boston wrote:

we can at least encourage them [paper authors] to get their code incorporated into active code bases.

We can learn from the history of the general database community here. The pipeline from paper to product code (open source or commercial) has grown over time. It is no longer the case of a good idea jumping straight to general usage or to standards and the role of standards is more at the outcome end of the pipeline.

Initial work, papers, technical ideas here in EasierRDF, are "X solves use case Y". The next step is to show the idea works in the wild with analysis of both positive and negative consequences in the context of the whole ecosystem.

"Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance.” —Kurt Vonnegut

bergos commented 5 years ago

@dbooth-boston wrote:

But I disagree with your assessment of property graph groups. I think RDF can and should learn from their successes, and I think RDF would only benefit from absorbing property graphs, by increasing the total pool of RDF developers.

Learning would mean analyzing first. Who said people are using the software, cause it's a property graph implementation? I haven't seen a technical argument for PG in the discussion. But I can tell you from my experience that only the PG vendors have problems designing ontologies in the RDF way, everyone else can manage it somehow... But I will give you some other arguments why non RDF graph implementations could have been chosen:

I'm sure there are many more and first step should be finding the main reasons, not just randomly pick a feature and port it to RDF.

I'm also not so happy about the React comparison, I see all the time. Even if you include technologies, typically combined with React like GraphQL, to have a full stack, that's comparing apples with oranges. Some things you should have in mind about the React stack:

We can learn from the React stack that good documentation is a big benefit for adoption. Also the quality of the tooling makes devs happy. But if you want it comparable simple, than you would have to chose a dictator for your standards, but anyway, the feature of RDF to support distributed data from different sources would always add some more complexity.

Coming back to the "easy" in the name of this repo. RDF will be always a little bit more complex then the solutions running in a "closed environment". If you want distributed data, you also need IRIs, which increases the complexity. We should focus on how we can make better documentation to understand the concepts behind RDF. Also libraries could hide some of the complexity from the average developer. I don't get how randomly picking features from (temporary) more successful database products and forming standards around it should solve the problem.

dbooth-boston commented 5 years ago

@bergos I completely agree with your points, but I guess I take a slightly different view of property graphs. To my mind, labeled properties are just a special case of n-ary relations. (Well, there are other ways they could be implemented, such as reification or named graphs, but those approaches seem much messier.) So if we have proper support for n-ary relations, then we get property graphs for free. That seems beneficial to me, regardless of how important labeled properties are technically.

dbooth-boston commented 5 years ago

I'm closing this because I think the scope and intent questions have been answered, but please feel free to re-open it if you think more clarification is needed.

afs commented 5 years ago

... newbie middle 33%-ers, who are far more likely to give up on RDF before they even get to SPARQL.

This is claim of "far more likely" needs substantiating. Indeed, my experience is that this is not the case because the background assumption that the 33% are starting with RDF then use SPARQL is not correct.

There are users who start with SPARQL, understand the basic data model of RDF, then they move on to the wider RDF ecosystem. A strength of RDF is that to use RDF, the data consumer can treat it simply as a data structure and get value from it. See DPpedia and wikidata data usage.

HughGlaser commented 5 years ago

Useful comment, @afs

And it depends on what you see as RDF when you do meet it first. I remember meeting RDF/XML as the first thing, and it was pretty much "Huh?". But I soon saw SPARQL and it was just such a wonderful discovery.

Had I first met Turtle (or whatever), along with SPARQL, or SPARQL first, it would have been far less painful. They just live to gather comfortably as data structures and query language, with clear pattern matching.

I should remark that I am not even in the 33% we are talking about, or the 33% you are in. As a "bear of very little brain", I had/have always found SQL indigestible (even though I had taught it!), and I also find the PG queries pretty unpleasant. SPARQL (and RDF/Turtle...) is just so simple, easy and beautiful; well, at least for the simple things I do :-)

dbooth-boston commented 5 years ago

@afs , good point, and you are right that I overstated the SPARQL barrier. The context was talking about "a bunch of stuff which is missing in the current [SPARQL] spec". I should have said that newbie middle 33%-ers are far more likely to give up on RDF before they run into those SPARQL limitations. But as @HughGlaser points out, I guess it also depends a lot on the route the person takes to enter the RDF world. Furthermore, those SPARQL limitations should still be addressed!

afs commented 5 years ago

Please stop referring to people we wish to reach out to as "average", "middle" or "newbie". They are skilled people who plenty of other things to do and we are unskilled in their areas of expertise.

akuckartz commented 5 years ago

@afs Which terms would be more appropriate?

dbooth-boston commented 5 years ago

@afs, good point that "newbie" has a pejorative connotation, and should be avoided. We could say "newcomer" instead.

However, I do not see how we could sensibly avoid referring to "average" developers, or developers of "middle 33%" of ability, because reaching them is explicitly the point. As stated in the introduction, the goal of this discussion and exploration is to make the RDF ecosystem "easy enough for average developers (middle 33% of ability) . . . who are new to RDF, to be consistently successful". I would be very much opposed to using terms that obscure this goal.