polifonia-project / sonar2021_demo

This repository is created for the documentation of the Polifonia demo that is going to be presented to SONAR2021
https://polifonia-project.github.io/sonar2021_demo/
2 stars 0 forks source link

ETL: data from jsonld to Sonar App #38

Closed ccolonna closed 2 years ago

ccolonna commented 2 years ago

Meanwhile we are waiting for Polifonia KG we have some real data extracted by datasets, with handcrafted annotations for the application. We can use this for more real testing.

The file is here Some basic transformation can be done to make it adhere to app domain.

This is related and will solve: https://github.com/polifonia-project/sonar2021_demo/issues/28

phivk commented 2 years ago

Hi @JaseMK @ccolonna, quick update from my side...

Querying serialised KG files

I had a look at the serialised data files and tried to figure out a straightforward way to query them.

Based on @ccolonna's tips I had a look at Comunica and the query_cli_file interface to pull data out from the .ttl file with a sparql query via the command line.

I tried also set up a sparql endpoint based on the local .ttl file and ran a Comunica Web Client, but didn't manage to query the endpoint yet. @ccolonna have you ever used Comunica via its Web Client to query a local file? any tips would be appreciated!

Diagram of the current demo data structure

I tried to visualise the current data model used in the demo app (doesn't contain all annotations): sonar demo data structure src (please request access if you'd like to edit): https://docs.google.com/drawings/d/1BJKoFjRkRLj_-KDLUVWMk4ws05evVWv1O4dRFwkRnlw/edit?usp=sharing

What this shows to me, is that currently annotations carry a lot of complexity. I wonder if we can simplify, for example by separating annotations (timecoded links from songs to places or lyric) from relationship (links from one song to another song via a place or lyric). I am not sure if it is most helpful to 'precompute' these relationships or calculate them dynamically client side in the app. @JaseMK what are your thoughts?

JaseMK commented 2 years ago

My thoughts are that the current model of relationships within annotations pointing to other songs is not correct and actually a relationship should link two annotations, not an annotation and another song, since it is the two annotations that are similar (potentially), not an annotation and a song. For example if AnnotaionA is place:Liverpool and Annotation:B is place:Manchester then they are simillar and can be linked. The song details can then be derived from the second annotation. To link an annotation directly to another song skips out a step and also fails to describe why the relationship is there at all. By linking the two annotations, the reason they are linked should be self explanatory and easily derived. If the annotation is linked directly to a song, it may not be clear which annotation within the target song has been used for the relationship (eg there may be two 'Liverpool' annotations).

However, this is simply remedied by replacing the "songID" within the relationship with "annotationID" - the end result would be a little more logic in the client app to retrieve the songs but it's a trivial change in reality.

The question over whether these relationships should be pre-computed or calculated client-side is a tricky one. My view is that whatever we do, it should be scalable to all types of annotations and relationships. Whilst we might be able to scan through some JSON in memory for other annotations that match some similarity criteria for (eg) 'placename', I'm not sure we can also do this similarity analysis client-side for lyrics or harmonics which are also required for the upcoming demo.

I can't see a feasible solution for this demo where we calculate relationships in real-time, unless I'm missing something. In the future, we would presumably delegate this process to an external service and the app itself would not need to even know whether the relationships were pre-computed or not, but for now I think pre-computing seems to be our only chance at getting this running as intended.

ccolonna commented 2 years ago

Hi ,

@phivk About querying data I created a little but flexible framework to extract and transform data from RDF sources being them, remote local, file, sparql endpoint etc... It runs comunica under the hood. It will be useful when we will receive Polifonia KG by Fiorela and Delfina.

Here you can see an example.

If you want to dig here you find the object launching requests client And the mapper to parse sparql bindings and return some js objects.

About cli I tried this and it seems to work:

comunica-sparql https://raw.githubusercontent.com/polifonia-project/sonar2021_demo/develop/src/assets/data/data_v2.jsonld -q 'SELECT * WHERE {?s ?p ?o} LIMIT 10'

New data extracted from Delfina file are here: data_v3. @Jason I tried to load them in the sonar app but it complains ctx.currentSong is null. Specifically I changed these lines:

import ApplicationData from '../assets/data/data_v1.json';
// import ApplicationData from '../assets/data/data_v3.json';

About data model my only reflession is that we thought the app as a social where a guy can create a note about a song, and reference through that note another song. I think this guided current domain model. More like comments than analysis on the songs.

If we think at annotation more like SongAnalysis. E.g. LyricsAnalysis, SpatialAnalysis, etc. We can have a SpatialAnalysis for a song. Another SpatialAnalysis for another song. And a connection can be done by the two Analysis according to some criteria.

For me it's perfectly fine what you describe let's go with this then.

Just a suggestion, usually the model is at the core of the system (e.g. image ). Clients are by nature transient and more affected by mutations. I mean if we have Songs and Annotations about songs we can imagine multiple apps or clients consuming those data. If you compute stuff client side you prevent other potential clients to access those entities. Another app, e.g. the mobile android version of this one, duplicated code to compute the relationships.

ccolonna commented 2 years ago

This pr should close this issue. Surely we can continue to discuss here :)