Closed whenmoon closed 11 months ago
for https://github.com/schananas/grounded_sam_replicate I am getting different error with latest cog from main branch on linux host w/o cuda:
Starting Docker image cog-groundedsamreplicate-base and running setup()...
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
Missing device driver, re-trying without GPU
Error response from daemon: page not found
Traceback (most recent call last):
File "/root/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/cog/server/http.py", line 403, in <module>
app = create_app(
File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/cog/server/http.py", line 94, in create_app
predictor = load_predictor_from_ref(predictor_ref)
File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/cog/predictor.py", line 192, in load_predictor_from_ref
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/src/predict.py", line 16, in <module>
os.chdir("/src/weights/GroundingDINO")
FileNotFoundError: [Errno 2] No such file or directory: '/src/weights/GroundingDINO'
ⅹ Failed to get container status: exit status 1
it correctly detected missing CUDA driver:
Missing device driver, re-trying without GPU
but failed later on missing weights. is it expected?
After downloading weights using script:
pip install huggingface_hub
python script/download_weights.py
now I am getting different error - model setup is unhappy:
File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/cog/predictor.py", line 73, in run_setup
predictor.setup()
File "/src/predict.py", line 49, in setup
self.groundingdino_model = load_model_hf(device)
File "/src/predict.py", line 40, in load_model_hf
args = SLConfig.fromfile(cache_config_file)
File "/src/weights/GroundingDINO/groundingdino/util/slconfig.py", line 185, in fromfile
cfg_dict, cfg_text = SLConfig._file2dict(filename)
File "/src/weights/GroundingDINO/groundingdino/util/slconfig.py", line 79, in _file2dict
check_file_exist(filename)
File "/src/weights/GroundingDINO/groundingdino/util/slconfig.py", line 23, in check_file_exist
raise FileNotFoundError(msg_tmpl.format(filename))
FileNotFoundError: file "/home/dmitri/SOURCE/Thirdparty/replicate/grounded_sam_replicate/weights/models--ShilongLiu--GroundingDINO/snapshots/a94c9b567a2a374598f05c584e96798a170c56fb/GroundingDINO_SwinB.cfg.py" does not exist
ⅹ Model setup failed
OK, after correctly installing weights inside container using:
cog run script/download_weights.py
I reproduced the original _lazy_init error with latest Cog:
File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py", line 985, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/cuda/__init__.py", line 229, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Looking into it.
The problem is in GroundingDINO source.
Here is the patch to make it working: fix_cuda_to_cpu.patch
Steps:
git clone https://github.com/schananas/grounded_sam_replicate
cd grounded_sam_replicate
# download weights
cog run script/download_weights.py
# apply attached patch
git apply fix_cuda_to_cpu.patch
# run
cog predict
...
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Done!
Written output to output.0.jpg
Written output to output.1.jpg
Written output to output.2.jpg
Written output to output.3.jpg
Result:
resolved
This is not necessarily a bug with cog, but a problem I'm having running a prediction model with cog. I am running a local version of this model on Replicate: https://replicate.com/schananas/grounded_sam. I have cloned the repo here: https://github.com/schananas/grounded_sam_replicate but after running
sudo cog predict
I get the errorraise AssertionError("Torch not compiled with CUDA enabled")
. The full trace is:I am using 0.8.6, python 3.10 pytorch 1.13.0 and torchvision 0.14.0 on macOS 12.6. I have these settings in code:
cog.ymal
predict.py:
Any help much appreciated!