kellnerd / musicbrainz-scripts

Bookmarklets and Userscripts for MusicBrainz.org
MIT License
31 stars 2 forks source link

Copyright parser edge case #39

Open julian45 opened 3 months ago

julian45 commented 3 months ago

While filling in copyrights for this release today, the copyright parser handled the © credit without a problem, but it doesn't seem to like the ℗ credit: ℗ «2024 Living,Dining&kitchen Records»

I was hoping to have it parse this as one complete label entry (for me to match with this label), but it only seems to pick up the Living part alone.

I don't know if there's a way to adjust the parser to handle this in a way that wouldn't negatively affect the function of the parser for more normal cases, but if not, I'm hoping that you might be able to help me figure out a way to make adjustments even just for my personal copy.

I've confirmed that this occurs in the latest version of the parser userscript, v2024.7.1.

kellnerd commented 3 months ago

I am afraid that it is not possible to correctly handle this edge case without breaking the detection of other more common cases.

You would have to change the "Advanced configuration" of the credit parser (which is collapsed by default). There are two features you have to work around temporarily:

  1. A comma terminates the credit statement.
  2. ~An ampersand separates two credited (label) names.~ (Edit: I forgot that this is my custom setting which I have not made a default so far)

In order to parse the whole rest of the line as a single credited name you have to adapt the "Credit terminator" ~and the "Name separator"~ settings. Unless you are familiar with regular expressions, the simplest way to achieve this is to temporarily empty these settings. After you have parsed the credit you can restore the default values using the reset buttons.

P.S. For this specific case (which has no space after the comma) you could also replace the (?=,| part of the credit terminator setting with (?=,\s|.

kellnerd commented 1 month ago

Just for the record: I am still not sure if the case of a comma without a consecutive space should prevent splitting by default or not. Maybe there are more cases like in your example, but changing this might also lead to other cases not being handled correctly anymore. I have just found back my commit from July and pushed it to a separate branch for now.