petermr / openDiagram

Extaction of semantic data from diagrams in scientific and other technical/business documents
Apache License 2.0
1 stars 5 forks source link

Wikimedia URL has escaped control characters which are de-escaped #18

Open petermr opened 3 years ago

petermr commented 3 years ago
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.8/tkinter/__init__.py", line 1883, in __call__
    return self.func(*args)
  File "/Users/pm286/projects/openDiagram/physchem/python/ami_gui.py", line 568, in <lambda>
    self.show_dictionary_item(event, dictionary))
  File "/Users/pm286/projects/openDiagram/physchem/python/ami_gui.py", line 585, in show_dictionary_item
    with urlopen(image_url) as u:
  File "/opt/anaconda3/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/anaconda3/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/opt/anaconda3/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/opt/anaconda3/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/opt/anaconda3/lib/python3.8/urllib/request.py", line 1379, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/opt/anaconda3/lib/python3.8/urllib/request.py", line 1350, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/opt/anaconda3/lib/python3.8/http/client.py", line 1240, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/anaconda3/lib/python3.8/http/client.py", line 1251, in _send_request
    self.putrequest(method, url, **skips)
  File "/opt/anaconda3/lib/python3.8/http/client.py", line 1094, in putrequest
    self._validate_path(url)
  File "/opt/anaconda3/lib/python3.8/http/client.py", line 1185, in _validate_path
    raise InvalidURL(f"URL can't contain control characters. {url!r} "
http.client.InvalidURL: URL can't contain control characters. '/wiki/Special:FilePath/Abies%20ceph%20Enos.JPG | http://commons.wikimedia.org/wiki/Special:FilePath/%CE%88%CE%BB%CE%B1%CF%84%CE%BF%20-%20%CE%A0%CE%AC%CF%81%CE%BD%CE%B7%CE%B8%CE%B1.jpg' (found at least ' ')

Somewhere the URL is interpreted as having control characters, whereas these are the actual characters in the filename