Open platform-kit opened 2 months ago
Hi @platform-kit. Thanks for reporting this issue. File uploads appear to be working for other models like stability-ai/stable-diffusion-3, and I'm not aware of any historical or ongoing incidents around that. However, I was able to reproduce this failure myself.
The ultimate failure was cog.server.runner.FileUploadError
, but the relevant part of the trace was here:
File "/usr/local/lib/python3.10/site-packages/cog/json.py", line 53, in <listcomp>
return [upload_files(value, upload_file) for value in obj]
File "/usr/local/lib/python3.10/site-packages/cog/json.py", line 55, in upload_files
with obj.open("rb") as f:
File "/usr/local/lib/python3.10/pathlib.py", line 1119, in open
return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: 'workspace/gradio_output.mp4'
So the problem isn't that uploading is broken, it's that the file can't be located.
Looking at internal logs, the success rate for camenduru/lgm does appear to change around May 2nd, which corresponds to our rollout of Cog v0.9.6. But I can't tell for sure whether that's a coincidence.
Looking into the source code there are a few things that seem suspect:
os.chdir
could be messing with where cog.yaml
file, which is... well, it's doing a lotI can't account for why this stopped working, but I'll continue looking into it.
I've reached out to Camb-ai in the linked issue and will work with them to try to get to the bottom of this.
@mattt thanks for the quick response.
I've deployed a fork of the Mars5-TTS model to Replicate that returns a Path
:
code for that deploy's predict.py
here:
https://github.com/Render-AI/MARS5-TTS/blob/f27cd6e99ac08033ca04d5450a3d36433e85d9f7/cog/predict.py
According to the cog docs,
For models that return a cog.Path object, the prediction output returned by Cog's built-in HTTP server will be a URL.
however the result is actually just a string with a relative path, not a url:
{
"completed_at": "2024-06-24T03:52:37.387801Z",
"created_at": "2024-06-24T03:49:27.655000Z",
"data_removed": false,
"error": null,
"id": "p4txt87bwxrgg0cg91e9v6d79w",
"input": {
"text": "This is a test",
"testMode": "false",
"ref_audio_file": "https://www.renderai.com/audio/examples/bob-example-1.mp3",
"ref_audio_transcript": "Space: the final frontier. These are the voyages of the starship enterprise. It's five year misssion: to explore strange new worlds; to seek out new life and new civilizations; to boldly go where no man has gone before."
},
"logs": ">>> Running inference\nWARNING:root:Reference audio duration is 20.06 > max suggested ref audio. Expect quality degradations. We recommend you trim prompt to be shorter than max prompt length.\nNote: using deep clone. Assuming input `c_phones` is concatenated prompt and output phones. Also assuming no padded indices in `c_codes`.\nNew x: torch.Size([1, 3022, 8]) | new x_known: torch.Size([1, 3022, 8]) . Base prompt: torch.Size([1, 1505, 8]). New padding mask: torch.Size([1, 3022]) | m shape: torch.Size([1, 3022, 8])\n>>>>> Done with inference",
"metrics": {
"predict_time": 107.894457702,
"total_time": 189.732801
},
"output": "output.mp3",
"started_at": "2024-06-24T03:50:49.493343Z",
"status": "succeeded",
"urls": {
"get": "https://api.replicate.com/v1/predictions/p4txt87bwxrgg0cg91e9v6d79w",
"cancel": "https://api.replicate.com/v1/predictions/p4txt87bwxrgg0cg91e9v6d79w/cancel"
},
"version": "33a2ed337e20ecbb932a2c304ea42ba0903f0c11ba68421ca9676f79fae82317"
}
Image of output in Replicate UI:
When running cog predict
locally, I can see that the file output.mp3
is successfully created. But despite existing, a URL is not produced. Just the relative path output.mp3
I also created a test mode on this model that, if enabled, skips inference and passes a file which I know to be on the filesystem (since it is included in the deploy). Same result.
In a later deploy, I also implemented tempfile
as the docs suggest, in case that had something to do with it. The results are the same - a relative path, though this time, in the /tmp
folder.
{
"completed_at": "2024-06-24T06:32:48.388967Z",
"created_at": "2024-06-24T06:29:21.517000Z",
"data_removed": false,
"error": null,
"id": "h7njk9afxnrg80cg93qt648w2g",
"input": {
"text": "test",
"testMode": "false",
"ref_audio_file": "https://replicate.delivery/pbxt/L9PPFliYxJQY8PfbICObAygtaNOvupQ4Bv5p6siBWwMu1buR/output%20(13)%20trimmed.wav",
"ref_audio_transcript": "Space: the final frontier. These are the voyages of the starship enterprise. It's five year misssion: to explore strange new worlds; to seek out new life and new civilizations; to boldly go where no man has gone before."
},
"logs": ">>> Running inference\nWARNING:root:Reference audio duration is 13.79 > max suggested ref audio. Expect quality degradations. We recommend you trim prompt to be shorter than max prompt length.\nNote: using deep clone. Assuming input `c_phones` is concatenated prompt and output phones. Also assuming no padded indices in `c_codes`.\nNew x: torch.Size([1, 2294, 8]) | new x_known: torch.Size([1, 2294, 8]) . Base prompt: torch.Size([1, 1035, 8]). New padding mask: torch.Size([1, 2294]) | m shape: torch.Size([1, 2294, 8])\n>>>>> Done with inference",
"metrics": {
"predict_time": 88.712041422,
"total_time": 206.871967
},
"output": "/tmp/tmpi7sk5pv3/output.mp3",
"started_at": "2024-06-24T06:31:19.676926Z",
"status": "succeeded",
"urls": {
"get": "https://api.replicate.com/v1/predictions/h7njk9afxnrg80cg93qt648w2g",
"cancel": "https://api.replicate.com/v1/predictions/h7njk9afxnrg80cg93qt648w2g/cancel"
},
"version": "c79ffec219ab9c9a7c1a430b506cf30ae0da2fb75ae4a6a8f486c99833bcddc3"
}
@platform-kit It looks like Path
is re-bound to pathlib.Path
on this line:
What happens if you change the return type of predict
to cog.Path
?
That was indeed the issue. Sorry for the hassle. Not sure if the mars5-tts issue is related to the error I quoted about https://replicate.com/camenduru/lgm - but for what it's worth, that same error came up at some point when I was migrating the mars5-tts repo to return Path. So maybe it is somehow related to whatever cog.Path does since v0.9.6.
Or maybe it's simply a user error caused by some kind of change in the docs? But then again maybe not, since it seems old models are breaking.
Hey @mattt when I try to use a file url (from default) then the cog errors out it gives out some URLFile error. It was all working fine few weeks back, not sure what's the issue now. And also randomly says "remote end disconnected"
file link: https://files.catbox.moe/be6df3.wav
error:
Traceback (most recent call last):
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/types.py", line 201, in __wrapped__
return object.__getattribute__(self, "__target__")
AttributeError: 'URLFile' object has no attribute '__target__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
response = self._make_request(
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connectionpool.py", line 536, in _make_request
response = conn.getresponse()
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connection.py", line 464, in getresponse
httplib_response = super().getresponse()
File "/root/.pyenv/versions/3.10.14/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/root/.pyenv/versions/3.10.14/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/root/.pyenv/versions/3.10.14/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
retries = retries.increment(
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/util/retry.py", line 474, in increment
raise reraise(type(error), error, _stacktrace)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/util/util.py", line 38, in reraise
raise value.with_traceback(tb)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
response = self._make_request(
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connectionpool.py", line 536, in _make_request
response = conn.getresponse()
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connection.py", line 464, in getresponse
httplib_response = super().getresponse()
File "/root/.pyenv/versions/3.10.14/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/root/.pyenv/versions/3.10.14/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/root/.pyenv/versions/3.10.14/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/server/runner.py", line 400, in _predict
input_dict[k] = v.convert()
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/types.py", line 136, in convert
shutil.copyfileobj(self.fileobj, dest)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/shutil.py", line 192, in copyfileobj
fsrc_read = fsrc.read
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/types.py", line 186, in __getattr__
return getattr(self.__wrapped__, name)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/types.py", line 204, in __wrapped__
resp = requests.get(url, stream=True)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/adapters.py", line 682, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))```
Model: https://replicate.com/camenduru/lgm
This is happening on many models, including ones that have been usable on Replicate for some time now.
See also: https://discord.com/channels/775512803439280149/779461485277347850/1243322233099653130 https://discord.com/channels/775512803439280149/1144193090970722394/1215721819143929917
also: search "upload error" or just "upload" for many more instances of people reporting this.
No response from the Replicate team that I could see.
Has something about cog's file output behavior changed?
This is the only conclusion I can come to, as the code I have used to export files (mp3, wav, etc) on my old models is not working on new ones.
For context - I am currently trying to help CambAI release a versoin of their Mars5-TTS that uses Replicate's native file output to return audio. However they haven't been able to make it work, nor have I.
Relevant issue: https://github.com/Camb-ai/MARS5-TTS/issues/40