polifonia-project / sonar2021_demo

This repository is created for the documentation of the Polifonia demo that is going to be presented to SONAR2021
https://polifonia-project.github.io/sonar2021_demo/
2 stars 0 forks source link

SONAR: Dataset investigation #4

Closed FiorelaCiroku closed 2 years ago

FiorelaCiroku commented 3 years ago

Task description

This task consists of Identifying candidate sources (audio, text, structures data) that can be combined to the sonar experiment. Validation of relevance and suitability of datasets must be done with an expert: Angelo Pompilo may be our reference expert, but others are more than welcome.

Progress

To be completed

jonnybluesman commented 3 years ago

UPDATE Andrea and I have identified three music collections providing chord annotations and covering different genres. In particular, we found the Isophonics dataset (mentioned in our last presentation) for pop music (mostly from The Beatles and Queen), the JAAH collection for jazz music, and the Winterreise dataset for classical music composed by Schubert.

Each of these collections also offers additional information and metadata, although there is no overlapping considering the diversity of music material. No audio is provided, hence we are also working on retrieving it from Spotify (play-only). Annotations are provided in different formats as well, hence we are parsing and converting everything to a standardised format for that. In sum, there is a lot to do before starting to analyse the potential similarities between and across these collections.

FiorelaCiroku commented 3 years ago

Progress

You can find the specific metadata and annotations of the music collections (those that we managed to re-organise, polish and standardise so far) at datasets repository. Remember to look for isophonics, jaah, and schubert-winterreise.

To avoid checking the annotations and the metadata specifically, we are creating a spreadsheet summarising all the info in a tabular way here.

It's a work-in-progress, but it already collects several info. Plus, we will be uploading some extra files containing the metadata and the annotations that we managed to find and retrieve from several sources (Genius, Spotify, Wikidata, etc.) in Sharepoint.

jonnybluesman commented 3 years ago

UPDATE

The data collection team (Andrea and I) has some beautiful news.

Specific fields / data types retrieved

We managed to put together a list containing all the specific information that we re-combined and cross-related from/between different online web resources. Just consider a couple of things: (i) the grey background denotes the information that we still cannot retrieve (mostly from WhoSampled); and (ii) a certain track, say "Yesterday", may not provide all the fields listed there, but a subset of them.

A taste of the data

On a separate branch of the sonar repo (here), we have started uploading all the data. It's 90% complete, as we are only missing the data from SongFacts for the JAAH collection (which seems to be quite sparse).

PS: the music annotations of these tracks, containing chords, etc. are on a separate repo for the moment.

@FiorelaCiroku soon you will receive a pull request, to merge the datasets branch into the main branch. @enridaga the repo should be a good starting point for preparing SPARQL Anything as you were explaining to us last week ; )

enridaga commented 3 years ago

Generated 2 CSV with basic metadata and lyrics from genius and songfacts, these are in the same branch datasets in the folder output. Instructions on how to reproduce are included in queries/README.md.

enridaga commented 3 years ago

Youtube link to be added to the Genius dataset

ccolonna commented 2 years ago

Closing this for inactivity. Dataset populating sonar app are there and a KG has been produced.