snejus / beetcamp

Bandcamp autotagger source for beets (http://beets.io)
GNU General Public License v2.0
64 stars 11 forks source link

Plugin fails with error resolving domain name #56

Closed ncraike closed 4 months ago

ncraike commented 4 months ago

I'm not 100% sure what's causing this, but I'm getting an error trying to import some FLAC-format albums which I've downloaded from Bandcamp. The error seems to be a problem resolving a domain name related to the artistname, Supercommuter.

I have the beetcamp/bandcamp plugin enabled, but also the default metadata fetching from MusicBrainz. The verbose output does seem to suggest the error is from the beetcamp plugin, though.

If I run beet -vv import path_to_album/, I get this traceback:

bandcamp: Trying our guess https://supercom/album/products-of-science before searching
Traceback (most recent call last):
  File "/home/natasha/.local/lib/python3.9/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/home/natasha/.local/lib/python3.9/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/local/lib/python3.9/socket.py", line 954, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] Name does not resolve

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/natasha/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/home/natasha/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/home/natasha/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/home/natasha/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/home/natasha/.local/lib/python3.9/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
  File "/home/natasha/.local/lib/python3.9/site-packages/urllib3/connection.py", line 205, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x8048efdc0>: Failed to resolve 'supercom' ([Errno 8] Name does not resolve)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/natasha/.local/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/home/natasha/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/home/natasha/.local/lib/python3.9/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='supercom', port=443): Max retries exceeded with url: /album/products-of-science (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x8048efdc0>: Failed to resolve 'supercom' ([Errno 8] Name does not resolve)"))

I was puzzled by the domain it seemed to be trying to lookup, supercom, since it's a truncated form of the artist name "Supercommuter".

I had a look at the FLAC file tags, and they all have this in the Comments field:

Visit https://supercommuter.net

If I use VLC to delete these comments, the import succeeds without error.

Is it possible the URL in the Comments field is triggering some attempt to resolve a truncated form of the domain?

snejus commented 4 months ago

Very interesting! Thanks for reporting this, amazing!

The plugin indeed looks at the comments field in the files that are being imported: back in the day when this was implemented Bandcamp search was not overly useful, so this URL would speed up the imports a lot.

I can indeed see in the logs it's trying to reach for the wrong URL:

bandcamp: Trying our guess https://supercom/album/products-of-science before searching

Could you by any chance double-check the comment field in the first track of the album?

snejus commented 4 months ago

Ah don't worry! I just checked the regular expression that parses this comment and I can see the issue

LABEL_URL_IN_COMMENT = re.compile(r"Visit (https:[\w/.-]+com)")

Essentially, this assumed that label URLs always end with com, like bandcamp.com, and thus it got confused about https://supercommuter.net: the com bit here is in the middle of the URL rather than at the end, so the code just cut it off there: https://supercom muter.net.

I'll get this fixed!

snejus commented 4 months ago

Should should be fixed now

ncraike commented 4 months ago

Sorry, I missed your replies earlier. I need to check my Github notification settings it seems.

Thanks so much for looking at this! I'm glad it was a pretty straightforward fix.