replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
8.04k stars 559 forks source link

locally call train interface failed in cog-sdxl #1808

Open sinopec opened 3 months ago

sinopec commented 3 months ago
  1. use cog-sdxl in https://github.com/replicate/cog-sdxl
  2. run cog build to generage docker image
  3. run docker locally use gpu
  4. call the train interface usr curl
curl -X 'POST' \
  'http://10.60.0.43:5000/trainings' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "input": {
    "input_images": "https://github.com/replicate/cog-sdxl/blob/main/example_datasets/zeke.zip"
  }
}'

always got Internal Server Errorl When I check the server log ,I got this error:

{
    "logger": "uvicorn.error",
    "timestamp": "2024-07-16T11:25:07.341402Z",
    "exception": "Traceback (most recent call last):\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py\", line 399, in run_asgi\n    result = await app(  # type: ignore[func-returns-value]\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py\", line 70, in __call__\n    return await self.app(scope, receive, send)\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/fastapi/applications.py\", line 284, in __call__\n    await super().__call__(scope, receive, send)\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/starlette/applications.py\", line 122, in __call__\n    await self.middleware_stack(scope, receive, send)\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/starlette/middleware/errors.py\", line 184, in __call__\n    raise exc\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/starlette/middleware/errors.py\", line 162, in __call__\n    await self.app(scope, receive, _send)\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/starlette/middleware/exceptions.py\", line 79, in __call__\n    raise exc\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/starlette/middleware/exceptions.py\", line 68, in __call__\n    await self.app(scope, receive, sender)\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py\", line 20, in __call__\n    raise e\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py\", line 17, in __call__\n    await self.app(scope, receive, send)\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/starlette/routing.py\", line 718, in __call__\n    await route.handle(scope, receive, send)\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/starlette/routing.py\", line 276, in handle\n    await self.app(scope, receive, send)\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/starlette/routing.py\", line 66, in app\n    response = await func(request)\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/fastapi/routing.py\", line 259, in app\n    content = await serialize_response(\n  File \"/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/fastapi/routing.py\", line 145, in serialize_response\n    raise ValidationError(errors, field.type_)\npydantic.error_wrappers.ValidationError: 1 validation error for TrainingResponse\nresponse\n  value is not a valid dict (type=type_error.dict)",
    "severity": "ERROR",
    "message": "Exception in ASGI application\n"
}

I also tried to trace the call stack, but I couldn't identify the cause of the issue.

sinopec commented 3 months ago

the cog version is 0.9.8

sinopec commented 3 months ago

I alse tried the simplest train exapmles(hello-train) in https://github.com/replicate/cog-examples.git
And I got the same exception and Traceback

sinopec commented 3 months ago

Can someone help me with this issue, or could you tell me how to correctly call the training function locally?

shabri-arrahim commented 1 month ago

Hy any luck on this one? I faced same issue @sinopec

BoBo0037 commented 2 weeks ago

same issue 'response_model=TrainingResponse' in http.py set to 'response_model=None' can skip this issue ?