seanbreckenridge / url_cache

A file system cache which saves URL metadata and summarizes content
https://pypi.org/project/url-cache/
Apache License 2.0
9 stars 1 forks source link

SSL: CERTIFICATE_VERIFY_FAILED #3

Closed chiragkanzariya-simformsolutions closed 3 years ago

chiragkanzariya-simformsolutions commented 3 years ago

It maybe looks like the library is not properly configured with SSL. Here I am getting the below error :

I am getting above both errors when I am parsing any YouTube URL.

seanbreckenridge commented 3 years ago

Hi; Thanks for opening an issue.

Seems to be working fine on my end for a couple Youtube URLs I'm testing for.

Could you give me a URL this fails for?

The youtube request is using the builtin urllib library instead of requests, I have a feeling that may be causing the error here.

I pushed an update to the submodule, could you try testing this by doing:

pip uninstall -y url_metadata
git clone 'https://github.com/seanbreckenridge/url_metadata'
cd ./url_metadata
git submodule update --init
pip install --user .
python3 -m url_metadata get "some youtube url"
chiragkanzariya-simformsolutions commented 3 years ago

@seanbreckenridge This is a youtube URL I have tried: https://www.youtube.com/watch?v=6G0mablNVXY But yeah I think I haven't checked with urllib so let me try with that.

Thank you for your quick response.

seanbreckenridge commented 3 years ago

That URL works fine me (though I don't see any subtitles on that video, not sure if this is a region issue), let me know if the above works.

chiragkanzariya-simformsolutions commented 3 years ago

I have attached my error here:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1350, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 950, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1424, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "get_metadata.py", line 8, in <module>
    m = metadata("https://www.youtube.com/watch?v=6G0mablNVXY")
  File "/Users/chirag.kanzariya/Desktop/360View/test_env/lib/python3.8/site-packages/url_metadata/__init__.py", line 13, in metadata
    return default_cache.get(url)
  File "/Users/chirag.kanzariya/Desktop/360View/test_env/lib/python3.8/site-packages/url_metadata/core.py", line 120, in get
    data: Metadata = self.request_data(uurl)
  File "/Users/chirag.kanzariya/Desktop/360View/test_env/lib/python3.8/site-packages/url_metadata/core.py", line 161, in request_data
    metadata.subtitles = download_subtitles(
  File "/Users/chirag.kanzariya/Desktop/360View/test_env/lib/python3.8/site-packages/url_metadata/youtube.py", line 23, in download_subtitles
    return download_subs(youtube_id, lang)
  File "/Users/chirag.kanzariya/Desktop/360View/test_env/lib/python3.8/site-packages/url_metadata/yt_subs/subtitles_downloader.py", line 22, in download_subs
    video_info: Dict[str, Any] = get_video_info(video_identifier)
  File "/Users/chirag.kanzariya/Desktop/360View/test_env/lib/python3.8/site-packages/url_metadata/yt_subs/subtitles_downloader.py", line 38, in get_video_info
    resp: HTTPResponse = urllib.request.urlopen(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1393, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1353, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)>
seanbreckenridge commented 3 years ago

Should get away from using urllib in general, so I pushed a new version (0.1.4) to pypi. Can you try:

pip uninstall -y url_metadata
pip install -U url_metadata

and try running it with the same URL?

chiragkanzariya-simformsolutions commented 3 years ago

This the code that I am trying with your library:

from url_metadata import metadata
m = metadata("https://www.youtube.com/watch?v=6G0mablNVXY")

print(m.info)

print(m.info["images"][0]["src"])
print('--------')
len(m.info["images"])
print(m.info["title"])
print('--------')
m.info["title"]
print(m.info["description"])

And BTW above steps also giving me the same error.

seanbreckenridge commented 3 years ago

Hmm, alright.

I'll have to look further into the SSL error.

Its quite late for me so I will probably take a look at this further tomorrow

seanbreckenridge commented 3 years ago

Just to confirm, does this video have subtitles for you or not? I don't see any subtitles on it.

chiragkanzariya-simformsolutions commented 3 years ago

Okay no worry, Thanks for the update.

chiragkanzariya-simformsolutions commented 3 years ago

I just want a title, description, and image of the URL. I am not interested in the subtitle for now.

seanbreckenridge commented 3 years ago

Right, but the error is coming from trying to download the subtitles.

If I'm not able to figure out the SSL error (honestly have no clue, but it seems it may be an error with the python installation process? see here for lots of similar errors/a possible solution), I may resolve this by adding a flag to skip downloading subtitles, or just reporting/warning if theres a requests error, instead of crashing.

seanbreckenridge commented 3 years ago

Pushed some changes to master. Before I push a version, could I get you to run:

pip uninstall -y url_metadata
git clone 'https://github.com/seanbreckenridge/url_metadata'
cd ./url_metadata
git submodule update --init
pip install --user .
python3 -m url_metadata get "some youtube url"

again and see if it ignores the error now?

Else you can do:

import logging
from url_metadata import URLMetadataCache

# pass the skip_subtitles flag so it doesnt download those
cache = URLMetadataCache(loglevel=logging.DEBUG, skip_subtitles=True)
c = cache.get("some youtube url")

or with the flag:

python3 -m url_metadata --skip-subtitles get "some youtube url"
chiragkanzariya-simformsolutions commented 3 years ago

I have some work in my bucket, I will check after 2 hours and let you know the details

seanbreckenridge commented 3 years ago

have pushed a new version (v0.1.5) to pypi with a flag/kwarg to ignore subtitles, feel free to re-open this if the flag doesn't work for you