kellnerd / harmony

Music Metadata Aggregator and MusicBrainz Importer
MIT License
22 stars 6 forks source link

Normalize and merge copyright lines #23

Open kellnerd opened 2 weeks ago

kellnerd commented 2 weeks ago

Continuing the discussion from https://github.com/kellnerd/harmony/issues/22#issuecomment-2156460070

We can factor out the copyright normalization logic and reuse it for other providers, e.g. as suggested for Tidal in https://community.metabrainz.org/t/harmony-music-metadata-aggregator-and-musicbrainz-importer/698641/15 But I'd do this after merging this PR. It also needs some further research. I know Tidal includes the copyright text both with and without the © symbol. What I'm unsure is whether this strictly contains copyright © info, or whether it also can sometimes contain phonographic copyright ℗ info. Spotify has those separated, which makes it easier.

I fully agree, this is enough for its own PR and it needs more research. Tidal also has a copyright property at the track level by the way, this should also be considered if it is different from the release level coypright. So far they were identical for the releases which I have checked, maybe a compilation has different values there.

For starters I have a commit in the dev branch which displays the alternative copyright values. When we have more examples we can decide how the release merge algorithm should handle these, one possibility would be to keep all and deduplicate them.

phw commented 2 weeks ago

For Tidal it is more complicated, the copyright field can contain both (P) or (C) entries, and it can be with or without a symbol (ASCII or proper character).

So I'm not sure whether we can or should add © or ℗ (or maybe "© + ℗") to the string if the symbol is missing. What we could do is converting the ASCII variants into symbols.

EDIT: Tidal's API docs give this as an example value for copyright: "(p)(c) 2017 S. CARTER ENTERPRISES, LLC. MARKETED BY ROC NATION & DISTRIBUTED BY ROC NATION/UMG RECORDINGS INC."

So even this makes it clear that it is not clearly for either © or ℗, but rather free to use for the labels / artists.

phw commented 2 weeks ago

I wonder that maybe "copyright" should be a list of strings per provider instead of a single string. Currently it is I think only Spotify which offers explicit distinct values for (P) and (C). But having those as actual separate values will make de-duplication easier. Right now the provider just separates both with a new line.