openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.38k stars 2.31k forks source link

[Bug]: get_absolute_file_path implementation not compatible with huggingface hub's symlink structure for onnx conversion #22736

Open Whadup opened 9 months ago

Whadup commented 9 months ago

OpenVINO Version

2023.3.0

Operating System

Ubuntu 20.04 (LTS)

Device used for inference

CPU

Framework

ONNX

Model used

No response

Issue description

Onnx models can be structured in two files, e.g. model.onnx and model.onnx_data. I can use openvino to convert these models, if both files are real files. When the files are symlinks, e.g. if they are stored in huggingface's cache directories, I can only convert the models if only model.onnx_data is a symlink, not if both models are symlinks. More precisely, if the two symlinks point to different folders or to the same folder, but other filenames than the orginal.

I've tracked this down to the implementation of get_absolute_file_path. It resolves the symlink using realpath() instead of converting the path to the symbolic-link to an absolute path to the same symbolic link. When we load the model.onnx file, it tries to find the data file next to it, but because we have resolved the symlink, it is not there.

Step-by-step reproduction

The easiest way to see this issue is to use the wrapper in huggingfaces optimum ( install latest version from git )

import os
from huggingface_hub import snapshot_download
from optimum.intel.openvino import OVModelForFeatureExtraction
model_cache_path = snapshot_download(
    repo_id="intfloat/multilingual-e5-large",
    allow_patterns="onnx/**",
)
model_cache_path = os.path.join(model_cache_path, "onnx")
model = OVModelForFeatureExtraction.from_pretrained(model_cache_path, from_onnx=True)

Internally this code loads the onnx model parts into blobs with names based on hashes and creates a folder called snapshot, where there are symlinks called model.onnx and model.onnx_data that point to two of those blobs.

lrwxrwxrwx 1 azureuser azureuser 55 Feb  8 11:38 config.json -> ../../../blobs/e6b213c6f68e5076e96386d284dffc85eef5fb12
lrwxrwxrwx 1 azureuser azureuser 79 Feb  8 11:38 model.onnx -> ../../../blobs/bb5a52503a3ef35247f5b5ae6c473aaae60505dd3ffaef56d7b69e2f84683c05
lrwxrwxrwx 1 azureuser azureuser 79 Feb  8 11:38 model.onnx_data -> ../../../blobs/0cf1883fee81c63819a44e2ba0efa51d4043d9759685a4ebebbde97e0623d15c
lrwxrwxrwx 1 azureuser azureuser 79 Feb  8 11:38 sentencepiece.bpe.model -> ../../../blobs/cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
lrwxrwxrwx 1 azureuser azureuser 55 Feb  8 11:38 special_tokens_map.json -> ../../../blobs/d5698132694f4f1bcff08fa7d937b1701812598e
lrwxrwxrwx 1 azureuser azureuser 79 Feb  8 11:38 tokenizer.json -> ../../../blobs/62c24cdc13d4c9952d63718d6c9fa4c287974249e16b7ade6d5a85e7bbb75626
lrwxrwxrwx 1 azureuser azureuser 55 Feb  8 11:38 tokenizer_config.json -> ../../../blobs/6de1940d16d38be9877bf7cc228c9377841b311f

In the logs below we see that we try to open /mnt/resource_nvme/cache/huggingface/hub/models--intfloat--multilingual-e5-large/snapshots/9f78368af0062735ba99812349c562316e29f719/onnx/model.onnx which gets resolved using readlink and then we process a file in the blob/ folder, where we cannot find the model.onnx_data.

Without this readlink, we could've loaded the symlink file instead where the corerct model.onnx_data exists next to it. Indeed, if I replace model.onnx with the actual file located in blobs/, I can load and convert the model to openvino.

Relevant log output

I get an RuntimeError: invalid external data: ExternalDataInfo(data_full_path: model.onnx_data, offset: 0, data_length: 1024008192)

When I check which files are being accessed, I get

[pid 32642] 11:17:51 lstat("/mnt/resource_nvme/cache/huggingface/hub/models--intfloat--multilingual-e5-large", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 32642] 11:17:51 lstat("/mnt/resource_nvme/cache/huggingface/hub/models--intfloat--multilingual-e5-large/snapshots", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 32642] 11:17:51 lstat("/mnt/resource_nvme/cache/huggingface/hub/models--intfloat--multilingual-e5-large/snapshots/9f78368af0062735ba99812349c562316e29f719", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 32642] 11:17:51 lstat("/mnt/resource_nvme/cache/huggingface/hub/models--intfloat--multilingual-e5-large/snapshots/9f78368af0062735ba99812349c562316e29f719/onnx", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 32642] 11:17:51 lstat("/mnt/resource_nvme/cache/huggingface/hub/models--intfloat--multilingual-e5-large/snapshots/9f78368af0062735ba99812349c562316e29f719/onnx/model.onnx", {st_mode=S_IFLNK|0777, st_size=79, ...}) = 0
[pid 32642] 11:17:51 readlink("/mnt/resource_nvme/cache/huggingface/hub/models--intfloat--multilingual-e5-large/snapshots/9f78368af0062735ba99812349c562316e29f719/onnx/model.onnx", "../../../blobs/bb5a52503a3ef3524"..., 4095) = 79
[pid 32642] 11:17:51 lstat("/mnt/resource_nvme/cache/huggingface/hub/models--intfloat--multilingual-e5-large/blobs", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 32642] 11:17:51 lstat("/mnt/resource_nvme/cache/huggingface/hub/models--intfloat--multilingual-e5-large/blobs/bb5a52503a3ef35247f5b5ae6c473aaae60505dd3ffaef56d7b69e2f84683c05", {st_mode=S_IFREG|0644, st_size=545850, ...}) = 0
[pid 32642] 11:17:51 openat(AT_FDCWD, "/mnt/resource_nvme/cache/huggingface/hub/models--intfloat--multilingual-e5-large/blobs/model.onnx_data", O_RDONLY) = -1 ENOENT (No such file or directory)

Issue submission checklist

Whadup commented 9 months ago

Updated the code snippet to reproduce the issue