superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.67k stars 311 forks source link

[bug] Temporarily missing media attachments #2175

Closed firescry closed 10 months ago

firescry commented 1 year ago

Describe the bug with a clear and concise description of what the bug is.

Media attachments are sometimes not visible in posts.

I observed this problem on this post. There are three images attached.

Screenshot of original post ![Screenshot of original post with three images attached](https://github.com/superseriousbusiness/gotosocial/assets/49809580/98f143e6-b1c9-4062-9e48-03001a9c95eb)

Original post was posted at 9:30, although nearly 2h later image attachments were still not visible on my GTS instance.

Screenshot of post as visible on my GTS instance (attachments not visible) In Tusky client: ![Screenshot of original post with attachments missing](https://github.com/superseriousbusiness/gotosocial/assets/49809580/30afc049-f12d-4389-b8da-e5c9c731abdb) In Elk client: ![Screenshot of original post with attachments missing](https://github.com/superseriousbusiness/gotosocial/assets/49809580/94d91274-6dbe-410f-99e3-966c1ee770df)

Eventually attached images become visible.

Screenshot of post as visible on my GTS instance (attachments visible) In Tusky client: ![Screenshot_20230903-125748](https://github.com/superseriousbusiness/gotosocial/assets/49809580/50d948cd-99e8-49ea-a205-1eed87e4114f) In Elk client: ![Screenshot_20230903-125758](https://github.com/superseriousbusiness/gotosocial/assets/49809580/9cdee24c-ef25-49f6-9458-31ed1eead4bf)

What's your GoToSocial Version?

0.11.1 git-c7a46e0

GoToSocial Arch

arm64

What happened?

Original post comes from account with many followers, so when my instance tried to fetch attachments, it timed-out (I assume that original instance was quite busy for some time trying to serve the attachments to all remote servers). This is what I see in the logs:

timestamp="03/09/2023 09:35:48.315" func=httpclient.(*Client).DoSigned level=INFO method=GET url=https://social.network.europa.eu/.well-known/webfinger?resource=acct%3AEU_Commission%40social.network.europa.eu requestID=<REDACTED> pubKeyID=https://<REDACTED>/users/<REDACTED>/main-key msg="performing request"
timestamp="03/09/2023 09:35:48.701" func=httpclient.(*Client).DoSigned level=INFO method=GET url=https://social.network.europa.eu/users/EU_Commission requestID=<REDACTED> pubKeyID=https://<REDACTED>/users/<REDACTED>/main-key msg="performing request"
timestamp="03/09/2023 09:35:49.013" func=workers.(*Processor).ProcessFromFediAPI level=INFO activityType=Create objectType=Note toAccount=<REDACTED> requestID=<REDACTED> msg="processing from fedi API"
timestamp="03/09/2023 09:35:49.014" func=server.glob..func1.Logger.func13.1 level=INFO latency="734.500391ms" userAgent="http.rb/5.1.1 (Mastodon/4.1.3; +https://social.network.europa.eu/)" method=POST statusCode=202 path=/users/<REDACTED>/inbox clientIP=141.94.105.157 requestID=<REDACTED> msg="Accepted: wrote 45B"
timestamp="03/09/2023 09:35:49.064" func=httpclient.(*Client).DoSigned level=INFO method=GET url=https://social.network.europa.eu/system/media_attachments/files/110/992/230/387/522/834/original/bb578aad8b827af8.png requestID=<REDACTED> pubKeyID=https://<REDACTED>/users/<REDACTED>/main-key msg="performing request"
timestamp="03/09/2023 09:35:49.314" func=httpclient.(*Client).DoSigned level=INFO method=GET url=https://social.network.europa.eu/users/EU_Commission/statuses/110971498619145849 pubKeyID=https://<REDACTED>/users/<REDACTED>/main-key msg="performing request"
timestamp="03/09/2023 09:35:49.639" func=server.glob..func1.Logger.func13.1 level=INFO latency="1.710453ms" userAgent="http.rb/5.1.1 (Mastodon/4.1.3; +https://social.network.europa.eu/)" method=GET statusCode=200 path=/users/<REDACTED>/main-key clientIP=141.94.105.157 requestID=<REDACTED> msg="OK: wrote 561B"
timestamp="03/09/2023 09:35:49.979" func=server.glob..func1.Logger.func13.1 level=INFO latency="7.252236ms" userAgent="http.rb/5.1.1 (Mastodon/4.1.3; +https://social.network.europa.eu/)" method=GET statusCode=200 path=/users/<REDACTED> clientIP=141.94.105.157 requestID=<REDACTED> msg="OK: wrote 1.06kiB"
timestamp="03/09/2023 09:35:50.821" func=httpclient.(*Client).DoSigned level=INFO method=GET url=https://social.network.europa.eu/system/media_attachments/files/110/971/497/172/216/939/original/2b783294bcc68fe6.jpeg pubKeyID=https://<REDACTED>/users/<REDACTED>/main-key msg="performing request"
context deadline exceeded (Client.Timeout or context cancellation while reading body)
timestamp="03/09/2023 09:36:19.237" func=dereferencing.(*deref).fetchStatusAttachments level=ERROR requestID=<REDACTED> msg="error loading attachment: finish: error decoding image: png: invalid format: not enough pixel data"
timestamp="03/09/2023 09:36:19.265" func=httpclient.(*Client).DoSigned level=INFO method=GET url=https://social.network.europa.eu/system/media_attachments/files/110/992/231/711/866/305/original/9b23321b35457aa2.png requestID=<REDACTED> pubKeyID=https://<REDACTED>/users/<REDACTED>/main-key msg="performing request"
context deadline exceeded (Client.Timeout or context cancellation while reading body)
timestamp="03/09/2023 09:36:49.427" func=dereferencing.(*deref).fetchStatusAttachments level=ERROR requestID=<REDACTED> msg="error loading attachment: finish: error decoding image: png: invalid format: not enough pixel data"
timestamp="03/09/2023 09:36:49.456" func=httpclient.(*Client).DoSigned level=INFO method=GET url=https://social.network.europa.eu/system/media_attachments/files/110/992/232/089/383/884/original/badfcd7f9fbba61a.png requestID=<REDACTED> pubKeyID=https://<REDACTED>/users/<REDACTED>/main-key msg="performing request"
context deadline exceeded (Client.Timeout or context cancellation while reading body)
timestamp="03/09/2023 09:37:19.655" func=dereferencing.(*deref).fetchStatusAttachments level=ERROR requestID=<REDACTED> msg="error loading attachment: finish: error decoding image: png: invalid format: not enough pixel data"

After some time my instance was able to fetch missing attachments.

What you expected to happen?

The biggest issue for me was that I didn't know that there are any attachments in that post. I guess that with distributed nature of Fediverse, problems with fetching remote media may happen from time to time, but it would be great if in such situations user is informed about part of the post missing (e.g. placeholder images could be shown) :)

How to reproduce it?

Setting timeout value to very low value and following accounts with many followers probably increases the chance of the issue to reoccur :)

Anything else we need to know?

Default value for timeout is set to 10s (https://github.com/superseriousbusiness/gotosocial/blob/v0.11.1/example/config.yaml#L748). I increased that value on my instance to 30s (the issue described above happened when that increased timeout was already set). I think I'll try to increase that value even more, but I'm not sure what side effects it may have (e.g. on instance performance). Also, should the default value in example config be changed?

NyaaaWhatsUpDoc commented 1 year ago

What platform are you running this GoToSocial install on? With what sort of network connection? I'm just trying to figure out if these slowly running requests are a GoToSocial code thing or perhaps your instance is saturating the amount it's trying to download on the available connection at any one time :thinking:

firescry commented 1 year ago

My instance runs on Raspberry Pi 4 and is connected via ethernet to the router. Theoretical transfer limit of internet connection is 500 Mbps. When I fetch the video file attached to this post, transfer is usually fast enough:

 curl https://social.network.europa.eu/system/media_attachments/files/110/949/939/218/577/000/original/143e5abbb5cb66dd.mp4 > /dev/null 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 36.7M  100 36.7M    0     0  15.0M      0  0:00:02  0:00:02 --:--:-- 15.0M

but sometimes is a bit slower:

$ curl https://social.network.europa.eu/system/media_attachments/files/110/949/939/218/577/000/original/143e5abbb5cb66dd.mp4 > /dev/null 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 36.7M  100 36.7M    0     0  5241k      0  0:00:07  0:00:07 --:--:-- 5168k
alok0 commented 1 year ago

I sometimes see something similar(?), on posts with videos. Typically what I see is that the version of the post on my timeline never gets the video.

Eventually loading the individual post will load the video, but the timeline version never shows the video.

I'm running on a VPS and am not bottlenecked afaik.

Maybe the remote cdn is just going to be slow when a user with large following posts a video and every instance in the world pulls?

I most commonly see this problem with stux's videos.

NyaaaWhatsUpDoc commented 12 months ago

in which case I think you can try tweaking the http client timeout in your settings. should be under http-client -> timeout

alok0 commented 12 months ago

I don't know how the api is supposed to work, but if we aren't able to cache the media locally, is there a way for the api to return the remote url.

I know that for mastodon if media is slow to load or if reject media is on, the frontend shows a blurhash placeholder and a link that goes to the remote media url. Though I'm not sure how it works in the api.

tsmethurst commented 11 months ago

Mm, probably showing a blurhash placeholder would be a good idea yeah. Right now if we can't get a media attachment on first fetch, I think we just don't show any attachments (iirc) until the next time the status is refreshed (cooldown of two hours). Placeholder would definitely help there to give the user more information.

tsmethurst commented 11 months ago

I did a bit of poking into this and didn't find a solution yet that I'm 100% happy with (do I ever?). So I'm planning to come back to it later.

For now, my thinking was along the lines of altering this function to put the db update/store inside the defer function:

https://github.com/superseriousbusiness/gotosocial/blob/2b6b9cdf832078980ca668126bce3b4fcfff02a9/internal/media/processingmedia.go#L84-L140

As such, if a load failed, the attachment database entry would still be stored, and the status would still display with the attachment's blurhash (I think), but the attachment would never have cached marked as true, and so a refetch/recache could be attempted immediately.

The downside here is that because we never parsed the image data (because we couldn't retrieve it), we don't know the dimensions and stuff, and can only guess at the extension. But perhaps that's OK?

@NyaaaWhatsUpDoc do you have any onions on that possible solution?

firescry commented 10 months ago

Thank you! ❤️