Error when building the ONNX model artifact

jbao commented 1 year ago

Steps to reproduce:

export DOCKER_DEFAULT_PLATFORM=linux/amd64
docker build -t viberary -f docker/prod/Dockerfile .
docker run -it viberary optimum-cli export onnx --model sentence-transformers/msmarco-distilbert-base-v3 sentence-transformers/msmarco-distilbert-base-v3_onnx/

Error message:

What's Next?
  View a summary of image vulnerabilities and recommendations → docker scout quickview
docker run -it viberary optimum-cli export onnx --model sentence-transformers/msmarco-distilbert-base-v3 sentence-transformers/msmarco-distilbert-base-v3_onnx/
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
 20:12:43.70
 20:12:43.75 Welcome to the Bitnami pytorch container
 20:12:43.78 Subscribe to project updates by watching https://github.com/bitnami/containers
 20:12:43.82 Submit issues and feature requests at https://github.com/bitnami/containers/issues
 20:12:43.85

Framework not specified. Using pt to export to ONNX.
Downloading (…)lve/main/config.json: 100%|███████████████████████████| 545/545 [00:00<00:00, 20.5kB/s]
Downloading pytorch_model.bin: 100%|███████████████████████████████| 265M/265M [00:21<00:00, 12.6MB/s]
Automatic task detection to feature-extraction (possible synonyms are: default, mask-generation, sentence-similarity).
Downloading (…)okenizer_config.json: 100%|███████████████████████████| 499/499 [00:00<00:00, 37.5kB/s]
Downloading (…)solve/main/vocab.txt: 100%|█████████████████████████| 232k/232k [00:00<00:00, 1.17MB/s]
Downloading (…)/main/tokenizer.json: 100%|█████████████████████████| 466k/466k [00:00<00:00, 4.20MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████| 112/112 [00:00<00:00, 10.3kB/s]
Using framework PyTorch: 2.0.1+cpu
/opt/bitnami/python/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py:223: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask, torch.tensor(torch.finfo(scores.dtype).min)
============== Diagnostic Run torch.onnx.export version 2.0.1+cpu ==============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Post-processing the exported models...
Validating models in subprocesses...
qemu: uncaught target signal 7 (Bus error) - core dumped

Maybe I'm missing something here?

veekaybee commented 1 year ago

Looks like you're trying to build on an arm machine but setting your architecture to amd.

The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

Can you check what you have locally? uname -p will do it on a Mac if that's what you have.

To make for your specific architecture, use make build and then make up arm or make up-intel - don't try to build it directly using Docker commands since it might break the build chain.

https://github.com/veekaybee/viberary/blob/main/Makefile

See directions here: https://github.com/veekaybee/viberary#running-the-project

jbao commented 1 year ago

Looks like you're trying to build on an arm machine but setting your architecture to amd.

That's correct. I also tried to set DOCKER_DEFAULT_PLATFORM=linux/arm64/v8, but no luck either and got the same error.

use make build and then make up arm or make up-intel

This works for me. But it also seems that the viberary container exits immediately after docker compose up, and I end up with only the redis container

CONTAINER ID   IMAGE                             COMMAND                  CREATED        STATUS         PORTS      NAMES
88770f84ee0d   redis/redis-stack-server:latest   "redis-server --port…"   18 hours ago   Up 3 seconds   6379/tcp   redis

which means again that make onnx doesn't work. 🤦

veekaybee commented 1 year ago

Could you share the stack trace you're getting when the container exits?

jbao commented 1 year ago

Potentially related to this.

2023-10-15 09:29:16 viberary  | Traceback (most recent call last):
2023-10-15 09:29:16 viberary  |   File "/opt/bitnami/python/bin/gunicorn", line 8, in <module>
2023-10-15 09:29:16 viberary  |     sys.exit(run())
2023-10-15 09:29:16 viberary  |   File "/opt/bitnami/python/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
2023-10-15 09:29:16 viberary  |     WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
2023-10-15 09:29:16 viberary  |   File "/opt/bitnami/python/lib/python3.8/site-packages/gunicorn/app/base.py", line 236, in run
2023-10-15 09:29:16 viberary  |     super().run()
2023-10-15 09:29:16 viberary  |   File "/opt/bitnami/python/lib/python3.8/site-packages/gunicorn/app/base.py", line 72, in run
2023-10-15 09:29:16 viberary  |     Arbiter(self).run()
2023-10-15 09:29:16 viberary  |   File "/opt/bitnami/python/lib/python3.8/site-packages/gunicorn/arbiter.py", line 229, in run
2023-10-15 09:29:16 viberary  |     self.halt(reason=inst.reason, exit_status=inst.exit_status)
2023-10-15 09:29:16 viberary  |   File "/opt/bitnami/python/lib/python3.8/site-packages/gunicorn/arbiter.py", line 342, in halt
2023-10-15 09:29:16 viberary  |     self.stop()
2023-10-15 09:29:16 viberary  |   File "/opt/bitnami/python/lib/python3.8/site-packages/gunicorn/arbiter.py", line 396, in stop
2023-10-15 09:29:16 viberary  |     time.sleep(0.1)
2023-10-15 09:29:16 viberary  |   File "/opt/bitnami/python/lib/python3.8/site-packages/gunicorn/arbiter.py", line 242, in handle_chld
2023-10-15 09:29:16 viberary  |     self.reap_workers()
2023-10-15 09:29:16 viberary  |   File "/opt/bitnami/python/lib/python3.8/site-packages/gunicorn/arbiter.py", line 530, in reap_workers
2023-10-15 09:29:16 viberary  |     raise HaltServer(reason, self.WORKER_BOOT_ERROR)
2023-10-15 09:29:16 viberary  | gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
viberary exited with code 1

veekaybee commented 1 year ago

Is this the entire stack trace? This says that gunicorn (the web server) didn't start, but there's likely an underlying reason of why it didn't do so.

jbao commented 1 year ago

ok, I found this after some digging:

ImportError: cannot import name 'url_quote' from 'werkzeug.urls'

Then I tried to pin the version of Werkzeug Werkzeug==2.2.2, but still came across this:

2023-10-15 16:16:41 viberary  | [2023-10-15 14:16:41 +0000] [1] [INFO] Starting gunicorn 21.2.0
2023-10-15 16:16:41 viberary  | [2023-10-15 14:16:41 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
2023-10-15 16:16:41 viberary  | [2023-10-15 14:16:41 +0000] [1] [INFO] Using worker: sync
2023-10-15 16:16:41 viberary  | [2023-10-15 14:16:41 +0000] [33] [INFO] Booting worker with pid: 33
2023-10-15 16:16:41 viberary  | [2023-10-15 14:16:41 +0000] [35] [INFO] Booting worker with pid: 35
2023-10-15 16:16:41 viberary  | [2023-10-15 14:16:41 +0000] [37] [INFO] Booting worker with pid: 37
2023-10-15 16:16:41 viberary  | [2023-10-15 14:16:41 +0000] [39] [INFO] Booting worker with pid: 39
2023-10-15 16:17:19 viberary  | [2023-10-15 14:17:19 +0000] [1] [ERROR] Worker (pid:39) exited with code 3
2023-10-15 16:17:19 viberary  | [2023-10-15 14:17:19 +0000] [1] [ERROR] Shutting down: Master
2023-10-15 16:17:19 viberary  | [2023-10-15 14:17:19 +0000] [1] [ERROR] Reason: Worker failed to boot.

Maybe can you share a version of requirements.txt with pinned versions?

veekaybee commented 1 year ago

Hmm. The requirements.txt installed into the Docker image should be using 2.2.2 because of the reason you stated: https://github.com/veekaybee/viberary/blob/d91182e6a78834aa8344697208eecb65f6523252/requirements.txt#L2

Just to doublecheck -

Are you using the latest code in the main branch (aka are you rebased?)
Where did you end up pinning that version? If it's in the code but not in the Docker image, you'll have errors.
The new error isn't related to Werkzeug, but should also give you more information than worker failed to boot. You can run make logs to get that full stack trace, or tail -f on the logs of the container, which is what the make command does.

In general it would be helpful to have the full reproducible error so we can take a look:

What command you ran
The entire stack trace of the Docker container at the point it stopped writing
Any error messages in the stack trace you're coming across
The state of the containers themselves.

jbao commented 1 year ago

I just added the --preload argument to gunicorn to do more debugging, and now it outputs

2023-10-15 20:49:11 viberary  |   File "/viberary/src/api/wsgi.py", line 6, in <module>
2023-10-15 20:49:11 viberary  |     from src.api.main import app
2023-10-15 20:49:11 viberary  |   File "/viberary/src/api/main.py", line 11, in <module>
2023-10-15 20:49:11 viberary  |     retriever = KNNSearch(RedisConnection().conn())
2023-10-15 20:49:11 viberary  |   File "/viberary/src/search/knn_search.py", line 31, in __init__
2023-10-15 20:49:11 viberary  |     self.model = ONNXEmbeddingGenerator(self.cm)
2023-10-15 20:49:11 viberary  |   File "/viberary/src/model/onnx_embedding_generator.py", line 13, in __init__
2023-10-15 20:49:11 viberary  |     self.model = ORTModelForFeatureExtraction.from_pretrained(self.onnx_path)
2023-10-15 20:49:11 viberary  |   File "/opt/bitnami/python/lib/python3.8/site-packages/optimum/onnxruntime/modeling_ort.py", line 651, in from_pretrained
2023-10-15 20:49:11 viberary  |     return super().from_pretrained(
2023-10-15 20:49:11 viberary  |   File "/opt/bitnami/python/lib/python3.8/site-packages/optimum/modeling_base.py", line 338, in from_pretrained
2023-10-15 20:49:11 viberary  |     raise OSError(f"config.json not found in {model_id} local folder")
2023-10-15 20:49:11 viberary  | OSError: config.json not found in sentence-transformers/msmarco-distilbert-base-v3_onnx/ local folder

which I think basically says I haven't generated the model artifact yet, and for the artifact, I need again a running container. 🤦

veekaybee commented 1 year ago

Makes sense. The model artifact doesn't come preloaded. Since it's fairly large, I don't include it in the git directory. You'd have to generate your own. The instructions are in the codebase if it's something you're curious to try out.

I'm in the process of standing up a model store but will need to think through how to make it shareable without incurring security risks or unwanted API traffic. I also considered putting it in HuggingFace spaces briefly but I don't think anyone's wanted the mode before 😆.

jbao commented 1 year ago

The instructions are in the codebase if it's something you're curious to try out.

Ok, will check that out and might come up with more questions. 😓

I'm in the process of standing up a model store

What's on the top of my head now: 1) Hosted MLflow on Databricks, not sure about the cost though. 2) Self-host an instance (with a bare minimum of resources) of MLflow in Digital Ocean?

veekaybee commented 1 year ago

Neither yet 😂, just looking to try out a barebones implementation of an API for DigitalOcean spaces https://try.digitalocean.com/cloud-storage

veekaybee commented 1 year ago

The instructions for generating the embeddings are here: https://github.com/veekaybee/viberary/tree/main/src/model, IIRC it cost me ~$5.00 to generate on the AWS instance. Looking into making the model public in the meantime.

veekaybee / viberary

Error when building the ONNX model artifact #111