Open YoungjaeDev opened 5 months ago
Hi,
Thanks for raising the issue. It is likely that you are right about the source of the bug. Will take a look next week to see if downgrade helps and if it does, we will fix the problem given it does not create another.
@PawelPeczek-Roboflow
I think the version of the transformers
package in dockerfile-gpu related to roboflow inference is the offending version. I'd like to lower it and give it a try, but can you check that first?
ok, checked that this fix work on my end: https://github.com/roboflow/inference/pull/363
We need to ship this with the next release, but for the time being you can build docker image on your end:
git clone git@github.com:roboflow/inference.git
cd inference
docker build --build-arg="TARGETPLATFORM=linux/amd64" -t roboflow/roboflow-inference-server-gpu:dev -f docker/dockerfiles/Dockerfile.onnx.gpu .
To run the server:
docker run --gpus all roboflow/roboflow-inference-server-gpu:dev
ok, checked that this fix work on my end: #363
We need to ship this with the next release, but for the time being you can build docker image on your end:
git clone git@github.com:roboflow/inference.git cd inference docker build --build-arg="TARGETPLATFORM=linux/amd64" -t roboflow/roboflow-inference-server-gpu:dev -f docker/dockerfiles/Dockerfile.onnx.gpu .
To run the server:
docker run --gpus all roboflow/roboflow-inference-server-gpu:dev
I have the same issue as @YoungjaeDev and @PawelPeczek-Roboflow suggestion above got me the same output, but with an additional error about WithFixedSizeCache.
bchip@brad-sff:~/inference$ docker run -p 9001:9001 --gpus all roboflow/roboflow-inference-server-gpu:dev
INFO: Started server process [7]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:9001 (Press CTRL+C to quit)
A new version of the following files was downloaded from https://huggingface.co/THUDM/cogvlm-chat-hf:
- configuration_cogvlm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/THUDM/cogvlm-chat-hf:
- visual.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/THUDM/cogvlm-chat-hf:
- modeling_cogvlm.py
- visual.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Downloading shards: 100%|██████████| 8/8 [06:40<00:00, 50.02s/it]
Loading checkpoint shards: 0%| | 0/8 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/app/inference/core/interfaces/http/http_api.py", line 179, in wrapped_route
return await route(*args, **kwargs)
File "/app/inference/core/interfaces/http/http_api.py", line 1266, in cog_vlm
cog_model_id = load_cogvlm_model(inference_request, api_key=api_key)
File "/app/inference/core/interfaces/http/http_api.py", line 476, in load_core_model
self.model_manager.add_model(core_model_id, inference_request.api_key)
File "/app/inference/core/managers/decorators/fixed_size_cache.py", line 61, in add_model
raise error
File "/app/inference/core/managers/decorators/fixed_size_cache.py", line 55, in add_model
return super().add_model(model_id, api_key, model_id_alias=model_id_alias)
File "/app/inference/core/managers/decorators/base.py", line 62, in add_model
self.model_manager.add_model(model_id, api_key, model_id_alias=model_id_alias)
File "/app/inference/core/managers/base.py", line 61, in add_model
model = self.model_registry.get_model(resolved_identifier, api_key)(
File "/app/inference/models/cogvlm/cogvlm.py", line 39, in __init__
self.model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 556, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3502, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3926, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 802, in _load_state_dict_into_meta_model
or (not hf_quantizer.check_quantized_param(model, param, param_name, state_dict))
File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 124, in check_quantized_param
if isinstance(module._parameters[tensor_name], bnb.nn.Params4bit):
KeyError: 'inv_freq'
INFO: 172.17.0.1:59332 - "POST /llm/cogvlm HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
+ Exception Group Traceback (most recent call last):
| File "/usr/local/lib/python3.10/dist-packages/starlette/_utils.py", line 87, in collapse_excgroups
| yield
| File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 190, in __call__
| async with anyio.create_task_group() as task_group:
| File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
| return await self.app(scope, receive, send)
| File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
| await super().__call__(scope, receive, send)
| File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
| raise exc
| File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
| await self.app(scope, receive, _send)
| File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 189, in __call__
| with collapse_excgroups():
| File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
| self.gen.throw(typ, value, traceback)
| File "/usr/local/lib/python3.10/dist-packages/starlette/_utils.py", line 93, in collapse_excgroups
| raise exc
| File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 191, in __call__
| response = await self.dispatch_func(request, call_next)
| File "/app/inference/core/interfaces/http/http_api.py", line 403, in count_errors
| self.model_manager.num_errors += 1
| AttributeError: 'WithFixedSizeCache' object has no attribute 'num_errors'
+------------------------------------
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 189, in __call__
with collapse_excgroups():
File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.10/dist-packages/starlette/_utils.py", line 93, in collapse_excgroups
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 191, in __call__
response = await self.dispatch_func(request, call_next)
File "/app/inference/core/interfaces/http/http_api.py", line 403, in count_errors
self.model_manager.num_errors += 1
AttributeError: 'WithFixedSizeCache' object has no attribute 'num_errors'
Hi @BChip, thank you for running the test, I have pushed small change to this PR and then followed your test steps, I see no error and additionally I can confirm transformers version is now bound to 4.37.2
. I will close this issue if there are no further problems concerning transformers version reported.
@grzegorz-roboflow Awesome, just tried it and it works! When will this be mainstream fixed?
Thank you!!!!!
@BChip - that will be shipped to dockerhub with next release - which I believe would be done as soon as we close and test this PR: https://github.com/roboflow/inference/pull/343 which consumes big part of our time and capacity now. I would expect it to be somewhere in the next 2 weeks. If you need temporary solution we may try to push special tag with build @grzegorz-roboflow did for you And btw - seems like the error described here: https://github.com/roboflow/inference/issues/355#issuecomment-2084469048 is reporting a bug of a separate kind that we also need to take a look, so thanks a lot for reporting
Search before asking
Bug
I'm encountering an issue while attempting to deploy the cogvlm model on my own GPU server using Roboflow inference code. The server setup seems to be correct, but when I try to run the model, I run into the following error:
Upon further investigation and based on this GitHub issue (https://github.com/THUDM/CogVLM/issues/396), it's recommended to downgrade the transformers library to version 4.37 due to compatibility issues. However, the current deployment is using version 4.38. Could you please confirm if the transformers version could be the source of this issue and if downgrading would be appropriate? Any other insights or suggestions would also be greatly appreciated. Thank you!
Environment
inference 0.9.20 inference-cli 0.9.20 inference-gpu 0.9.20 inference-sdk 0.9.20
x86-gpu(rtx3090)
Minimal Reproducible Example
Additional
No response
Are you willing to submit a PR?