truenas / charts

TrueNAS SCALE Apps Catalogs & Charts
BSD 3-Clause "New" or "Revised" License
311 stars 297 forks source link

Immich Machine Learning not downloading model from Huggingface.co #2714

Closed soldier9945 closed 3 months ago

soldier9945 commented 3 months ago

Hey there!

I have a problem with the machine learning pod from Immich in the latest image. I'm still using TrueNAS-SCALE-23.10.2 until I have migrated all my TrueCharts Apps to TrueNAS Apps or Docker-Compose on a VM...

Wanted to ask if some of you have the same error with a fresh install of immich?... Initially I have migrated from a TrueCharts Image (backup/restore immich's DB and point to the same file storage for library, upload, etc.). But then I tried a fresh copy of immich and also got the error.

Apparently, the machine learning pod cannot securely download the models immich is using:

SSLError:(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url:/api/models/immich-app/ViT-B-32__openai/revision/main (Caused by SSLError(SSLCertVerificationError(1, '[SSL:CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)')))"), '(Request ID:7da1e5b6-bff5-40a2-b0fd-7ebaaf851985)')

Can anyone help fix this? seems to be limited to the community edition TrueNAS Chart.

stavros-k commented 3 months ago

If it's still failing, can you try from TrueNAS host shell to run this?

curl https://huggingface.co/api/models/immich-app/ViT-B-32__openai/revision/main

Does it return a json string or it fails?

Thanks

soldier9945 commented 3 months ago

Hello stavros-k, thanks for your suggestion!

Yes, I get the following json string back:

admin@truenas[~]$ curl https://huggingface.co/api/models/immich-app/ViT-B-32__openai/revision/main
{"_id":"653c56fc4a52f10eaf25bba5","id":"immich-app/ViT-B-32__openai","modelId":"immich-app/ViT-B-32__openai","author":"immich-app","sha":"77520f2136c0467e32b012be8d190ccf110c5667","lastModified":"2024-07-22T14:29:08.000Z","private":false,"disabled":false,"gated":false,"tags":["transformers","onnx","immich","clip","endpoints_compatible","region:us"],"downloads":203437,"library_name":"transformers","likes":5,"model-index":null,"config":{},"cardData":{"tags":["immich","clip"]},"transformersInfo":{"auto_model":"AutoModel"},"siblings":[{"rfilename":".gitattributes"},{"rfilename":"README.md"},{"rfilename":"config.json"},{"rfilename":"textual/fp16/model.armnn"},{"rfilename":"textual/merges.txt"},{"rfilename":"textual/model.onnx"},{"rfilename":"textual/special_tokens_map.json"},{"rfilename":"textual/tokenizer.json"},{"rfilename":"textual/tokenizer_config.json"},{"rfilename":"textual/vocab.json"},{"rfilename":"visual/fp16/model.armnn"},{"rfilename":"visual/model.armnn"},{"rfilename":"visual/model.onnx"},{"rfilename":"visual/preprocess_cfg.json"}],"spaces":[],"createdAt":"2023-10-28T00:34:04.000Z"}%
admin@truenas[~]$

I found something related but I don't know if it helps with anything: https://stackoverflow.com/questions/75110981/sslerror-httpsconnectionpoolhost-huggingface-co-port-443-max-retries-exce

I'm still using TrueNAS-SCALE-23.10.2 until I have migrated all my TrueCharts Apps to TrueNAS Apps or Docker-Compose on a VM...

Update: I'd have tried the same command from the machinelarning pod but curl's not available in that container

stavros-k commented 3 months ago

Update: I'd have tried the same command from the machinelarning pod but curl's not available in that container

Check if wget is available?

soldier9945 commented 3 months ago

No, it is not... I'm currently trying to download / install it but I keep finding instructions for installing it with a package manager or using a dedicated pod to use it but I haven't figured out how this works exactly...

soldier9945 commented 3 months ago

I just created a get-model.py script in the machinelearning pod with this one-liner:

echo -e "import urllib.request\nurl = 'https://huggingface.co/api/models/immich-app/ViT-B-32__openai/revision/main'\nurllib.request.urlretrieve(url, 'model.json')" > get-model.py

Running it in the TrueNAS shell downloads the json content. But here's what happens in the machinelearning pod:

root@immich-machinelearning-b9846f986-z4mzc:/usr/src/app# python3 get-model.py > get-model.py.log                                                                                                         [3/324]Traceback (most recent call last):
  File "/usr/local/lib/python3.11/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,                                                                                                                                                   File "/usr/local/lib/python3.11/http/client.py", line 1303, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1349, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1298, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1058, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.11/http/client.py", line 996, in send
    self.connect()
  File "/usr/local/lib/python3.11/http/client.py", line 1475, in connect
    self.sock = self._context.wrap_socket(self.sock,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/ssl.py", line 1104, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.11/ssl.py", line 1382, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/get-model.py", line 6, in <module>
    urllib.request.urlretrieve(url, 'model.json')
  File "/usr/local/lib/python3.11/urllib/request.py", line 241, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
                            ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 1351, in do_open
      raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)>
root@immich-machinelearning-b9846f986-z4mzc:/usr/src/app#

Does this help?

stavros-k commented 3 months ago

Can you try this?

import ssl
import socket

def get_certificate(hostname, port=443):
    # Create a default SSL context
    context = ssl.create_default_context()

    # Connect to the server and perform the SSL handshake
    with socket.create_connection((hostname, port)) as sock:
        with context.wrap_socket(sock, server_hostname=hostname) as ssock:
            # Get the server's certificate
            certificate = ssock.getpeercert()
            return certificate

# Replace with your hostname
hostname = 'huggingface.co'

# Get and print the SSL certificate
certificate = get_certificate(hostname)
print(certificate)

Oneliner

echo -e "import ssl\nimport socket\n\ndef get_certificate(hostname, port=443):\n    # Create a default SSL context\n    context = ssl.create_default_context()\n\n    # Connect to the server and perform the SSL handshake\n    with socket.create_connection((hostname, port)) as sock:\n        with context.wrap_socket(sock, server_hostname=hostname) as ssock:\n            # Get the server's certificate\n            certificate = ssock.getpeercert()\n            return certificate\n\n# Replace with your hostname\nhostname = 'huggingface.co'\n\n# Get and print the SSL certificate\ncertificate = get_certificate(hostname)\nprint(certificate)" > inspect_cert.py
soldier9945 commented 3 months ago

Here's what I got:

root@immich-machinelearning-b9846f986-z4mzc:/usr/src/app# python3 inspect_cert.py
Traceback (most recent call last):
  File "/usr/src/app/inspect_cert.py", line 19, in <module>
    certificate = get_certificate(hostname)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/inspect_cert.py", line 10, in get_certificate
    with context.wrap_socket(sock, server_hostname=hostname) as ssock:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/ssl.py", line 1104, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.11/ssl.py", line 1382, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)
stavros-k commented 3 months ago

ugh right,

echo -e "import ssl\nimport socket\n\ndef get_certificate(hostname, port=443):\n    # Create an SSL context that does not verify certificates\n    context = ssl._create_unverified_context()\n\n    # Connect to the server and perform the SSL handshake\n    with socket.create_connection((hostname, port)) as sock:\n        with context.wrap_socket(sock, server_hostname=hostname) as ssock:\n            # Get the server's certificate\n            certificate = ssock.getpeercert()\n            return certificate\n\n# Replace with your hostname\nhostname = 'huggingface.co'\n\n# Get and print the SSL certificate\ncertificate = get_certificate(hostname)\nprint(certificate)" > inspect_cert.py
soldier9945 commented 3 months ago

Ehm... I get an empty response (I guess...):

root@immich-machinelearning-b9846f986-z4mzc:/usr/src/app# python3 inspect_cert.py
{}
stavros-k commented 3 months ago
cat << 'EOF' > ssl_certificate_inspector.py
import socket
import ssl
from datetime import datetime

def get_certificate_raw(hostname, port=443):
    context = ssl.create_default_context()
    context.check_hostname = False
    context.verify_mode = ssl.CERT_NONE

    with socket.create_connection((hostname, port)) as sock:
        with context.wrap_socket(sock, server_hostname=hostname) as secure_sock:
            der_cert = secure_sock.getpeercert(binary_form=True)
            return ssl.DER_cert_to_PEM_cert(der_cert)

def print_certificate_info(pem_cert):
    print("PEM Certificate:")
    print(pem_cert)

hostname = 'huggingface.co'

try:
    cert_pem = get_certificate_raw(hostname)
    print_certificate_info(cert_pem)
except Exception as e:
    print(f"An error occurred: {e}")
EOF

Then

python3 ssl_certificate_inspector.py > out.pem
openssl x509 -in out.pem -text -noout

if openssl is not available in the container, either share the out.pem or run the openssl on your host.

soldier9945 commented 3 months ago

Okay, apparently I'm getting my traefik self-signed certificate served here...

I'm still using traefik from TrueCharts and the port 443 on my TrueNAS is traefik. Are you sure this is linked to the error? I can post the result here if needed but I guess this info is enough...?

stavros-k commented 3 months ago

Yea seems that something intercepting the request and redirecting to your host. Try from another container that might have nslookup available and see what it resolves to. Maybe something is up with your DNS.

soldier9945 commented 3 months ago

Thanks for the help!

I'll be trying to diagnose what's happening here.... strange... but is most definitely my setup.

stavros-k commented 3 months ago

Thanks for the help!

np

I'll be trying to diagnose what's happening here.... strange... but is most definitely my setup.

I'll close this one now, hope you find a solution!

Thanks

soldier9945 commented 3 months ago

Maybe this can help someone, I just fixed my problem.

I tried to find a way to query DNS from within that container to see what huggingface.co was resolved to and why it was redirected back to my TrueNAS host....

If found the following tool that worked in the immich-machinelearning container:

getent ahosts huggingface.co

This query showed that my own WAN-IP appeared to be resolved??? Anyway, I saw that a subdomain (in the example here: MY.SUBDOMAIN.COM) was added to the query response, like so:

<MY-PUBLIC-IP-HERE>    STREAM huggingface.co.MY.SUBDOMAIN.COM
<MY-PUBLIC-IP-HERE>    DGRAM
<MY-PUBLIC-IP-HERE>    RAW

I found in the TrueNAS Network Settings that I had added this subdomain as an additional domain lookup: image

Once I cleared all additional domains from this field, I restarted immich and everything was working again.