w3c / EasierRDF

Making RDF easy enough for most developers
262 stars 13 forks source link

Tools are scattered #2

Open dbooth-boston opened 5 years ago

dbooth-boston commented 5 years ago

How to find them? Which to use? Every team wastes time going through a similar research and selection process.

"Beginners drown in the options." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0229.html

IDEA: LAMP for RDF

Create a bundled release of RDF tools, analogous to a standard LAMP stack, or Red Hat or Ubuntu; so that if someone wants to use RDF all they have to do is install that bundle and they're ready to go.

Richard Cyganiak called this the "LARP" stack: http://richard.cyganiak.de/blog/2005/09/rdf-and-web-applications/

"such a platform already exists :) We call it LinkedDataHub": https://linkeddatahub.com/docs/about https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0052.html

IDEA: semantic web generic client in a box

"Simply start a server pointing at a SPARQL endpoint(s). Optionally configure it with some example queries, metadata about what property to derive labels (or introspect this from the endpoint itself), and get:

azaroth42 commented 5 years ago

You lost the target audience with "Simply ... SPARQL endpoint". In other words, the first sentence.

maximveksler commented 5 years ago

Might consider producing a family of https://yeoman.io generators.

maximveksler commented 5 years ago

btw, regarding "best" - the awesome list concepts growing on GitHub over the years seems useful. I'm contributing to https://github.com/semantalytics/awesome-semantic-web for that reason.

dbooth-boston commented 1 year ago

I've taken a crack at starting a list of software that might become a bundled LAMP-like release of software that a new RDF user would need to start building "typical" RDF applications: https://github.com/w3c/EasierRDF/blob/master/RDF-LAMP.md

It's very meager so far. Anyone have suggestions for additions or changes?

TallTed commented 1 year ago

Your single listed data store (Blazegraph) is no longer (for some years now!) supported nor maintained, as it has been subsumed into Amazon Neptune.

At the risk of blowing my (employer's) own horn, I would suggest Virtuoso Open Source be on the list, as it is the engine behind the majority of nodes in the LOD Cloud.

I would also nominate the OpenLink Structured Data Sniffer which helps reveal and/or confirm the structured data, in islands or markup or otherwise, found in any given web page. Several other of our browser extensions might also be well included.

I might also suggest retitling RDF-LAMP to Ramp or RDF-amp, because, to start with, Linux is not (always) part of your stack, given you've mandated that this stack run on Windows, macOS, and Linux. The other letters in that acronym (Apache, mySQL, and PHP/Perl/Python) also deserve demotion, as these are not the key ingredients of any RDF-based solution; an "amp[lifier]", howeer....

dbooth-boston commented 1 year ago

Your single listed data store (Blazegraph) is no longer (for some years now!) supported nor maintained, as it has been subsumed into Amazon Neptune.

Ugh, thanks for catching that. I hadn't realized that they stopped supporting the open source version. That disqualifies it. :(

At the risk of blowing my (employer's) own horn, I would suggest Virtuoso Open Source be on the list, as it is the engine behind the majority of nodes in the LOD Cloud.

Sure, I'll list that at least for the moment. Thanks!

I would also nominate the OpenLink Structured Data Sniffer which helps reveal and/or confirm the structured data, in islands or markup or otherwise, found in any given web page.

Would you consider that similar to the RDF Browser Firefox plug-in that Andreas Harth mentioned today on the semantic-web list?

I might also suggest retitling RDF-LAMP to Ramp or RDF-amp . . . .

I agree that it should be renamed, but I'd like to wait a bit to see how things evolve and see what other name ideas emerge, to avoid renaming it multiple times.

kurzum commented 1 year ago

Fun fact: Virtuoso "Universal" Server came out some years after Apache. It is called Universal, because it is actually AMP, i.e. it has web server, relational database & triple store as well as Virtuoso PL, which is quite similar to PHP, syntax-wise.

The docu of Virtuoso is written for a professional audience, stuff like nginx, mariadb or mongodb or others are easy reads in comparison, maybe also because they can only do one thing and not many things. Also you can apt-get install it.

TallTed commented 1 year ago

I would also nominate the OpenLink Structured Data Sniffer which helps reveal and/or confirm the structured data, in islands or markup or otherwise, found in any given web page.

Would you consider that similar to the RDF Browser Firefox plug-in that Andreas Harth mentioned today on the semantic-web list?

There's some similarity at a casual first glance, but I think there are significant differences.

First off, the RDF Browser plug-in is Firefox-only, while OSDS supports all major browsers including Chrome, Safari, Edge, Opera, and Firefox.

Second, I think relatedly, RDF Browser last pushed a new version in April 2021, and though there have been a couple of develop merges since, it appears to be rather stagnant. OSDS code maintenance hasn't been well disciplined, as a number of updated builds have shipped based on the develop branch without reaching the main branch, but that should be cleaned up shortly.

Third, and I think most significantly, the RDF Browser changes the HTTP Accept: header to include several extra media types, which may change the payload and/or media type of the server's response. In contrast, OSDS parses the "normal" payload returned to the browser, handling structured data found in POSH as well as islands (including entire documents) of Turtle, JSON, Microdata, RDFa, Atom, RSS, CSV, and others.

There are other differences, but I hope that's enough to entice you to install and explore a bit, and to add OSDS to the list.

jmkeil commented 1 year ago

Your single listed data store (Blazegraph) is no longer (for some years now!) supported nor maintained, as it has been subsumed into Amazon Neptune.

Ugh, thanks for catching that. I hadn't realized that they stopped supporting the open source version. That disqualifies it. :(

At the risk of blowing my (employer's) own horn, I would suggest Virtuoso Open Source be on the list, as it is the engine behind the majority of nodes in the LOD Cloud.

Sure, I'll list that at least for the moment. Thanks!

To replace Blazegraph in Wikibase, Wikimedia developers recently did an evaluation of open source alternatives for Blazegraph.

fekaputra commented 1 year ago

I would propose to add OpenRefine + grefine-rdf-extension into the list for ontology population. Additionally, while it’s not mapped directly to one of the functionalities listed in the page, I would nominate YASGUI to be part of this RDF-LAMP

dbooth-boston commented 1 year ago

To replace Blazegraph in Wikibase, Wikimedia developers recently did an evaluation of open source alternatives for Blazegraph.

Awesome report, and very helpful! Their criteria are heavily weighted toward high performance, but also include several criteria that are more relevant to this effort:

Based on those criteria, I've added RDF4J and Jena as RDF database candidates also.

dbooth-boston commented 1 year ago

Good idea @fekaputra , I've added YASGUI and OpenRefine to the list. However, I did not find license information for the grefine-rdf-extension itself. I only found license information for software that the extension uses, such as Sesame, Any23, Lucene and Xerces. Can you please add appropriate FOSS license info for the extension itself? I created an issue for it.

And thanks @TallTed , I've added OSDS too.

afs commented 1 year ago

Add Oxigraph https://github.com/oxigraph/oxigraph as an RDF triple store.

(I have no connection with the project).

dbooth-boston commented 1 year ago

Added, thanks

jmkeil commented 1 year ago

Now there are four… I'm skeptical this serves the intended purpose.

From RDF-LAMP.md:

  • should represent the easiest and most popular community choice in its category.

I'm further skeptical that "most popular community choice" is a helpful criteria. There are tools with a long tradition, that the community got used to whose instability, bad usability, or limited standard compliance, but which will discourage beginners. Selection criteria should better focus on stability, usability and standard compliance, which is my understanding of "easy".

dbooth-boston commented 1 year ago

Yes, one will have to be selected, but it seemed a little premature to do that yet. I'd like to get more suggestions on the table first, and refine our selection criteria.

I'm further skeptical that "most popular community choice" is a helpful criteria. . . .

Very good point. I've just reworded that criterion in response to your comment, and moved standards compliance to its own bullet. What do you think now?

dbooth-boston commented 1 year ago

Issue #96 suggests Docker as an easy way to provide a sample SPARQL database. Would Docker be a good way to provide an RDF-LAMP stack in general, or would it be too limiting? Opinions? Pros/cons?

TallTed commented 1 year ago

Is "a good way to provide an RDF-LAMP stack in general" to be another (vocalized) popularity contest?

(Docker has its boosters, as do AWS, Azure, and other cloud providers. Each has its own pros and cons. I submit that whether the people watching this issue are fully versed in any, never mind some or all, alternatives is debatable at best, and that most of us will have some degree of religiosity and/or prejudice in our declared preferences and/or recommendations. The impact of that religiosity can only be minimized [it can't be avoided entirely] by building a table of features and other requirements, and comparing implementations thereby — which will require at least one person here to have deep familiarity with each system. This would itself be a reinvention of many supposedly objective comparison tables out there on the web.)

Or is popularity now to be taken/treated as an inherent negative, as suggested by 'skeptical that "most popular community choice" is a helpful criteria'? (I don't think popularity should identify a winner on its own, but it can be useful when one of many metrics of comparison.)

dbooth-boston commented 1 year ago

Is "a good way to provide an RDF-LAMP stack in general" to be another (vocalized) popularity contest?

Maybe, but I'm also interested in technical rationale. Basically, I was first assuming that a bundled LAMP-like RDF stack would be released as a download that people would install natively, on their Linux, MacOS or Windows environments. But another option might be to bundle it up as a Docker image. I'm wondering whether people think that would be a better idea.

kasei commented 1 year ago

I’ve always found Docker nice because beyond the relative portability, the associated Dockerfile tells you exactly how the software was installed and configured which is useful even if you’re not going to use the container.

TallTed commented 1 year ago

A Docker-based starter kit seems a good direction.

There is some performance hit taken, as with any emulation-space, when running a Linux Docker image on a macOS or Windows host (and a lesser hit, I'd imagine, on a Linux host), but before that hit becomes noteworthy, folks are likely to be ready to migrate to a native installation of the various components.

jmkeil commented 1 year ago

@TallTed wrote:

Or is popularity now to be taken/treated as an inherent negative, as suggested by 'skeptical that "most popular community choice" is a helpful criteria'? (I don't think popularity should identify a winner on its own, but it can be useful when one of many metrics of comparison.)

Of course not. It is just no clear indication. Neither in the one, nor in the other direction. Popular tools might be very stable as they have been tested and improved in thousands of projects, or they might contain high technical dept accumulated over the years. Unpopular tools might contain many problems yet unknown due to little use, or they might have applied from the outset the lessens learned since the release of the popular ones.

TallTed commented 1 year ago

@jmkeil — Saying "'most popular community choice' is [not a] clear indication" is very different from saying "'most popular community choice' is [not a] helpful criteria".

The first says that popularity should not be taken to indicate a winner on its own (with which I agree), while the second says that popularity should not be considered at all (with which I do not agree).

Overall age of a project, time since the last significant update, numbers of contributors, numbers of deployers, numbers of end users... All of these are worth consideration as part of the picture; none of them should be considered sufficient on their own.

dbooth-boston commented 3 months ago

Related discussion about tools: https://lists.w3.org/Archives/Public/semantic-web/2024Apr/0009.html