microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
13.68k stars 2.78k forks source link

std::invalid_argument thrown when TensorRT cache path exceeds 260 characters #17998

Open JulienTheron opened 9 months ago

JulienTheron commented 9 months ago

Describe the issue

When Long Paths are enabled, I expect the TensorRT engine to be built regardless the length of its path. Currently, it fails with a "invalid wchar_t filename argument" exception.

When debugging the issue, I found the culprit in the inclusion of the <experimental/filesystem> header in tensorrt_execution_provider_utils.h.

I've confirmed the issue is fixed by replacing the header with <filesystem> (and updating the namespace). At least on Windows.

To reproduce

  1. Enable Long Paths support on Windows and in your executable's manifest.
  2. Enable TensortRT engine cache and specify a path that exceeds 260 characters.
  3. Create a ORT session.
  4. If your model has dynamic shapes, do an inference.

Urgency

No response

Platform

Windows

OS Version

Windows 11 22H2

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.15.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

TensorRT 8.6.1; CUDA 11.8; cuDNN 8.9.0

snnn commented 9 months ago

The source file should avoid using std::string as file paths. It should use std::filesystem::path and ORTCHAR_T type of strings. CreateFileW doesn't have the problem. See: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew

Switching to only partially solves the problem. Encoding issue would still exist.

tianleiwu commented 9 months ago

I also encountered the issue. Part of the cause is the engine filename is super long like TensorrtExecutionProvider_TRTKernel_graph_main_graph_14360603375063178697_0_0_fp16_sm89.engine.

If combine with a long directory name like C:\Users\user_name\.cache\huggingface\hub\models--runwayml--stable-diffusion-v1-5\snapshots\ded79e214aa69e42c24d3f5ac14b76d568679cc2\ort_trt_engine\, it could exceed 260 chars.

And we know that Long Path is not default in Windows, so it can be problem.

@yf711, could you change to use shorter engine name?

JulienTheron commented 9 months ago

Whether or not the engine name can be shortened, I think that's for another discussion, as the cache folder could exceed 260 characters anyway. Also, unless there's a good reason for using the deprecated <experimental/filesystem> header, the fix is easy. I could do a PR, if that's ok.

snnn commented 9 months ago

And we know that Long Path is not default in Windows, so it can be problem.

No, you missed a point: wide-char string paths do not have the issue! You were saying multi-byte strings like std::string, const char*. Unicode based Windows APIs do not have the issue.

chilo-ms commented 9 months ago

I think we should use std::filesystem::path to deal with file paths and instantiate the std::ifstream instances in order to solve the issues of long path and encoding once for all as Changming mentioned

One of the reasons we use std::experimental::filesystem::path not std::filesystem::path is ORT seems not fully supporting c++17 in the past and we should switch to use std::filesystem::path now.

@JulienTheron, is this a blocker issue for you? We might need some time to make the change.

JulienTheron commented 9 months ago

We implemented a patch before building the ORT so we're not blocked. I agree it would be best to fully embrace std::filesystem. You can take your time.

Thanks!

BengtGustafsson commented 8 months ago

The problem is that Microsoft has decided to break portability by NOT making the std APIs that take std::filesystem::path convert long paths to the special \\?\ prefixed paths that underlying WIN32 APIs require to work with longer than 253 character paths. I filed a bug report to Microsoft about this a year ago but they replied that they won't fix this due to a "policy decision". Note that the "solution" they purport is not working. Even with long filenames enables in the registry you can't use that feature without the \\?\ prefix on each path.

NB: Once paths are prefixed with \\?\ they don't accept forward slashes as path separators or embedded ..\ or .\ combinations or duplicated \ characters so we had to make a rather elaborate adjustPath function and wrap all uses of std::filesystem::path for functions which actually touches the filesystem (such as exists, remove, create_directories etc.) apart from opening iostreams in calls to adjustPath.

I kindly supplied the source code for adjustPath in the issue here: https://developercommunity.visualstudio.com/t/long-paths-crash-filesystem::recursive_d/10120391

JulienTheron commented 8 months ago

The problem is that Microsoft has decided to break portability by NOT making the std APIs that take std::filesystem::path convert long paths to the special \?\ prefixed paths that underlying WIN32 APIs require to work with longer than 253 character paths. I filed a bug report to Microsoft about this a year ago but they replied that they won't fix this due to a "policy decision". Note that the "solution" they purport is not working. Even with long filenames enables in the registry you can't use that feature without the \?\ prefix on each path.

I agree it should have been easier for us, but that's not true that you need to use \?\ when Long Paths are enabled in the registry. Your executable has to defined itself as "long-path aware" by including a manifest. Then you don't need anything to do in your code, except if you use MAX_PATH somewhere, of course.

yf711 commented 5 months ago

I also encountered the issue. Part of the cause is the engine filename is super long like TensorrtExecutionProvider_TRTKernel_graph_main_graph_14360603375063178697_0_0_fp16_sm89.engine.

If combine with a long directory name like C:\Users\user_name\.cache\huggingface\hub\models--runwayml--stable-diffusion-v1-5\snapshots\ded79e214aa69e42c24d3f5ac14b76d568679cc2\ort_trt_engine\, it could exceed 260 chars.

And we know that Long Path is not default in Windows, so it can be problem.

@yf711, could you change to use shorter engine name?

@JulienTheron Please check ORT 1.17 as it supports customizable TRT engine cache prefix, and hope it could improve your use case. Thanks!

JulienTheron commented 5 months ago

We are changing the includes for <experimental/filesystem> to <filesystem> before building ORT. In our opinion, this is a more reliable workaround than customizing the TRT engine cache filename. Thanks.