polifonia-project / KG_data_transformation

Script and services to extract data from raw files JSON,csv,tsv and create Polifonia KG
0 stars 0 forks source link

Bug with duplicate values #30

Closed ccolonna closed 2 years ago

ccolonna commented 2 years ago

There are duplicates in raw data. And duplicates generated by dataset transformations.

Harmonic transformation

"harmonicSimIRI": "https://w3id.org/polifonia/resource/HarmonicSimilarity/harm_sim_isophonics_173_isophonics_243_00002"
 "harmonicSimIRI": "https://w3id.org/polifonia/resource/HarmonicSimilarity/harm_sim_isophonics_243_isophonics_173_00002"

Same instance happens twice, probably generating URI policy not affected by symmetry will extirpate this. It will be an easy solution. Otherwise we need to change the query to don't considered duplicate. Better do not going over this path.

Lyric Lines Transformation:


        {
            "lyrSimId": "lyr_sim_isophonics_45_isophonics_288_178_179",
            "compSimScore": null,
            "humanSimScore": null,
            "lineA": {
                "lineLabel": "Ha da da, ha da da ahh",
                "recordingId": "isophonics_45",
                "lineNumber": "178",
                "recordingName": "Don't Stop Me Now",
                "artistName": "Queen"
            },
            "lineB": {
                "lineLabel": "Ah da, ah da, ah da, ah da",
                "recordingId": "isophonics_288",
                "lineNumber": "178",
                "recordingName": "Lovely Rita",
                "artistName": "The Beatles"
            }
        },

                   {
            "lyrSimId": "lyr_sim_isophonics_45_isophonics_208_195_196",
            "compSimScore": null,
            "humanSimScore": null,
            "lineA": {
                "lineLabel": "Ha da da, ha da da ahh",
                "recordingId": "isophonics_45",
                "lineNumber": "195",
                "recordingName": "Don't Stop Me Now",
                "artistName": "Queen"
            },
            "lineB": {
                "lineLabel": "Ah-ah-ah, ah-ah-ahh",
                "recordingId": "isophonics_208",
                "lineNumber": "195",
                "recordingName": "A Day in the Life",
                "artistName": "The Beatles"
            }
        },

                {
            "lyrSimId": "lyr_sim_isophonics_45_isophonics_208_243_244",
            "compSimScore": null,
            "humanSimScore": null,
            "lineA": {
                "lineLabel": "Ha da da, ha da da ahh",
                "recordingId": "isophonics_45",
                "lineNumber": "243",
                "recordingName": "Don't Stop Me Now",
                "artistName": "Queen"
            },
            "lineB": {
                "lineLabel": "Ah-ah-ah, ah-ah-ahh",
                "recordingId": "isophonics_208",
                "lineNumber": "243",
                "recordingName": "A Day in the Life",
                "artistName": "The Beatles"
            }
        },

                        "lineA": {
                "lineLabel": "Ha da da, ha da da ahh",
                "recordingId": "isophonics_45",
                "lineNumber": "245",
                "recordingName": "Don't Stop Me Now",
                "artistName": "Queen"
            },

These are probably not real duplicates but , same phrase appearing more times in the same song. How do we handle them ? @andreamust what do you think ? Just keeping them there ?

This probably shouldn't be changed not in KG_data_transformation (nice news) but if we want filter out in KG2SONAR app transformation. But according to which criteria ?

ccolonna commented 2 years ago

Spatial duplicates:

again luckily there's no problem in KG transformation. But we have the same value differing for timestamps. Unfortuantely timestamp are not considered in the annotation. This can be removed in the KG2SONAR app.

        {
            "track_id": "isophonics_208",
            "title": "A Day In The Life",
            "title_for_iri": "a_day_in_the_life",
            "recording_places": [
                {
                    "ref_id": "00110",
                    "session_id": "00110_1",
                    "type": "recorded at",
                    "type_id": "recorded at",
                    "session_type": "mp:RecordingSession",
                    "begin": "1967-01-19",
                    "end": "1967-01-19",
                    "ended": "true",
                    "place": {
                        "id": "6f12a5d2_52e5_4dec_9fed_494b1f65bb94",
                        "type": "Studio",
                        "name": "Abbey Road Studios: Studio 2",
                        "address": "3 Abbey Road, St John\u2019s Wood, London",
                        "coordinates": {
                            "latitude": "51.53192",
                            "longitude": "-0.17835"
                        }
                    }
                },
                {
                    "ref_id": "00110",
                    "session_id": "00110_2",
                    "type": "recorded at",
                    "type_id": "recorded at",
                    "session_type": "mp:RecordingSession",
                    "begin": "1967-01-20",
                    "end": "1967-01-20",
                    "ended": "true",
                    "place": {
                        "id": "6f12a5d2_52e5_4dec_9fed_494b1f65bb94",
                        "type": "Studio",
                        "name": "Abbey Road Studios: Studio 2",
                        "address": "3 Abbey Road, St John\u2019s Wood, London",
                        "coordinates": {
                            "latitude": "51.53192",
                            "longitude": "-0.17835"
                        }
                    }
                },
                {
                    "ref_id": "00110",
                    "session_id": "00110_3",
                    "type": "recorded at",
                    "type_id": "recorded at",
                    "session_type": "mp:RecordingSession",
                    "begin": "1967-02-03",
                    "end": "1967-02-03",
                    "ended": "true",
                    "place": {
                        "id": "6f12a5d2_52e5_4dec_9fed_494b1f65bb94",
                        "type": "Studio",
                        "name": "Abbey Road Studios: Studio 2",
                        "address": "3 Abbey Road, St John\u2019s Wood, London",
                        "coordinates": {
                            "latitude": "51.53192",
                            "longitude": "-0.17835"
                        }
                    }
                },
                {
                    "ref_id": "00110",
                    "session_id": "00110_4",
                    "type": "recorded at",
                    "type_id": "recorded at",
                    "session_type": "mp:RecordingSession",
                    "begin": "1967-02-10",
                    "end": "1967-02-10",
                    "ended": "true",
                    "place": {
                        "id": "c56fdea4_e81e_439a_a183_a52eb1141409",
                        "type": "Studio",
                        "name": "Abbey Road Studios: Studio 1",
                        "address": "3 Abbey Road, St John\u2019s Wood, London",
                        "coordinates": {
                            "latitude": "51.53192",
                            "longitude": "-0.17835"
                        }
                    }
                },
                {
                    "ref_id": "00110",
                    "session_id": "00110_5",
                    "type": "recorded at",
                    "type_id": "recorded at",
                    "session_type": "mp:RecordingSession",
                    "begin": "1967-02-22",
                    "end": "1967-02-22",
                    "ended": "true",
                    "place": {
                        "id": "6f12a5d2_52e5_4dec_9fed_494b1f65bb94",
                        "type": "Studio",
                        "name": "Abbey Road Studios: Studio 2",
                        "address": "3 Abbey Road, St John\u2019s Wood, London",
                        "coordinates": {
                            "latitude": "51.53192",
                            "longitude": "-0.17835"
                        }
                    }
                }
            ],
ccolonna commented 2 years ago

Closed by this https://github.com/polifonia-project/KG_data_transformation/commit/228224e7bbc056535479a2d0f96553b5b7a999ee and this https://github.com/polifonia-project/sonar2021_data_transformation/pull/25