santhoshtr / wikipedia-ipfs

An exploration to host Wikipedia in IPFS
MIT License
174 stars 12 forks source link

Semantic Web pointer #4

Open ArneBinder opened 4 years ago

ArneBinder commented 4 years ago

With respect to ideas regarding SPARQL and RDF mentioned in the readme: DBpedia provides some interesting resources, especially they published a dbpedia version in NIF-RDF: https://wiki.dbpedia.org/downloads-2016-10 and see here for an example RDF version of one article.

This format is very interesting because it holds detailed structural information about the article, like section-, paragraph- and link-annotations that are all accessible via SPARQL queries. It would allow for fine-grained queries over multiple articles to return only the important text snippets, e.g. selected sections (wikipedia articles can be very very long) or all paragraphs that mention certain other concepts (i.e. articles) etc. pp.

I'm not sure if it would be useful to take that format as underlying article representation, but maybe one can borrow some notation or parts of the schema to construct sth useful for textbased IPLD. Especially in-file selectors (linking to a certain text span in a file) is very useful to have (e.g. just to model html links or even arbitrary annotations that might be managed with sth similar to hypothes.is).

Disclaimer: I'm an NLP researcher from the information extraction domain and would highly appreciate that kind of detailed information. Nevertheless, I think these resources would be very useful for a wider community.

metasj commented 4 years ago

What do sequential revisions look like in that model?