superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.57k stars 300 forks source link

[bug] Timelines are stuck after a period of time #916

Closed koyuawsmbrtn closed 1 year ago

koyuawsmbrtn commented 1 year ago

Describe the bug with a clear and concise description of what the bug is.

Me and my users see their timeline being stuck after a few hours

What's your GoToSocial Version?

GoToSocial 0.5.1-SNAPSHOT git-ccd5b34

GoToSocial Arch

amd64 Docker

Browser version

Google Chrome 106.0.5249.119

What happened?

My timeline got stuck after a few hours

What you expected to happen?

Having new posts appear in my timeline, even after a few hours

How to reproduce it?

  1. Have a server running for a long time
  2. Check your timeline one day
  3. Profit

Anything else we need to know?

I like pineapple pizza

tsmethurst commented 1 year ago

That sucks! I think restarting GtS might be a temporary workaround until we can investigate this properly. Can you try that and see if it gets your timelines moving again for now?

Also, just checking, are you pulling latest as your docker container? It would seem so judging by that commit version. latest is built from main, so it might not always be totally stable -- though we do try to keep it that way.

koyuawsmbrtn commented 1 year ago

Yes, restarting helps. I'll roll back to a stable version. The :latest tag is present in the docker-compose.yml. It's probably a recent commit which might have broken that. I'm using :latest because client compatibility is a lot better.

tsmethurst commented 1 year ago

I'm using :latest because client compatibility is a lot better.

Ahh yes that's fair, it also looks a lot nicer ;)

I'm not sure how successful rolling back will be, but it is worth a try and don't think it will irreparably break anything.

Btw, do you see this bug every time after just a few hours or does it vary? Just gathering some info for when I have time to look at it

koyuawsmbrtn commented 1 year ago

It seems to vary. I sometimes have a whole day without the bug appearing.

tsmethurst commented 1 year ago

Alright, thanks for the info :) it sounds similar to a bug i thought i'd fixed a while ago, but which is apparently cropping up again... how many active users are there on your instance? i wonder if they might be competing for locks or something..

koyuawsmbrtn commented 1 year ago

I have 2-3 concurrent users, that's not many

tsmethurst commented 1 year ago

that's not many

indeed... alright, will take a look when i get time, thanks again for opening this bug report :)

koyuawsmbrtn commented 1 year ago

It seems to happen a lot more often the more the server is online. Reverting to a stable image magically fixed the issue, so it's definitely a bug that has been introduced in a commit.

tsmethurst commented 1 year ago

Thanks for the extra info :) that will help track it down.

Also posting some logs we got from someone else, which might help:

timestamp="22/10/2022 14:14:57.702" func=transport.(*transport).do level=ERROR pubKeyID=[redacted] method=GET url=[redacted] msg="backing off for 4s after http request error: Get \"[redacted]\": dial tcp: lookup [redacted] on 127.0.0.53:53: dial udp 127.0.0.53:53: socket: too many open files"
koyuawsmbrtn commented 1 year ago

This looks like my issue. Something something 127.0.0.1. I'm running it behind an nginx proxy using the configuration from the official documentation.

koyuawsmbrtn commented 1 year ago

It looks like upgrading from a stable release to this current main branch fixed the issue. I'll monitor my instance further and report back any issues.

Update: it's stuck again.

tsmethurst commented 1 year ago

Thanks for the update! Will start looking at this properly in the next couple days, now I'm back from short break + then being ill for a while :P

tsmethurst commented 1 year ago

Hey could you possibly grep your error logs for wrote more than the declared Content-Length and let me know if anything shows up? I've got a little pet theory about what's going wrong. It's either storage streams being held open, or tcp sockets held open, imo :thinking:

koyuawsmbrtn commented 1 year ago

Nope, this doesn't output anything

koyuawsmbrtn commented 1 year ago

I got this one though

func=router.loggingMiddleware.func1 level=INFO latency=94.879933ms clientIP=192.168.48.1 userAgent="http.rb/5.0.4 (Mastodon/3.5.3; +https://mastodon.lol/)" method=POST statusCode=401 path=/users/luqaska/inbox msg="Unauthorized: wrote 23B"

sns.koyu.space being my instance in this case

blackle commented 1 year ago

I reproduced this today and used delve to figure out what's going on. I found a ton of goroutines that are all stuck trying to lock mutexes in the KVStore. here's a backtrace from one:

 0  0x00000000004412bd in runtime.gopark
    at /home/blackle/Downloads/go/src/runtime/proc.go:364
 1  0x000000000044134a in runtime.goparkunlock
    at /home/blackle/Downloads/go/src/runtime/proc.go:369
 2  0x00000000004525fd in runtime.semacquire1
    at /home/blackle/Downloads/go/src/runtime/sema.go:150
 3  0x000000000046f845 in sync.runtime_SemacquireMutex
    at /home/blackle/Downloads/go/src/runtime/sema.go:77
 4  0x000000000049f4d1 in sync.(*Mutex).lockSlow
    at /home/blackle/Downloads/go/src/sync/mutex.go:171
 5  0x000000000049f191 in sync.(*Mutex).Lock
    at /home/blackle/Downloads/go/src/sync/mutex.go:90
 6  0x000000000232f4ff in codeberg.org/gruf/go-mutexes.(*MutexMap).spinLock
    at ./vendor/codeberg.org/gruf/go-mutexes/map.go:201
 7  0x000000000232f6a7 in codeberg.org/gruf/go-mutexes.(*MutexMap).lock
    at ./vendor/codeberg.org/gruf/go-mutexes/map.go:212
 8  0x0000000002330445 in codeberg.org/gruf/go-mutexes.(*MutexMap).Lock
    at ./vendor/codeberg.org/gruf/go-mutexes/map.go:349
 9  0x0000000002342947 in codeberg.org/gruf/go-store/kv.(*KVStore).Lock
    at ./vendor/codeberg.org/gruf/go-store/kv/store.go:60
10  0x000000000234476d in codeberg.org/gruf/go-store/kv.(*KVStore).Lock-fm
    at <autogenerated>:1
11  0x00000000023432a4 in codeberg.org/gruf/go-store/kv.(*KVStore).put
    at ./vendor/codeberg.org/gruf/go-store/kv/store.go:104
12  0x00000000023431b0 in codeberg.org/gruf/go-store/kv.(*KVStore).Put
    at ./vendor/codeberg.org/gruf/go-store/kv/store.go:99
13  0x0000000002b75e51 in github.com/superseriousbusiness/gotosocial/internal/storage.(*Local).Put
    at ./internal/storage/local.go:51
14  0x0000000002c364c3 in github.com/superseriousbusiness/gotosocial/internal/media.(*ProcessingEmoji).loadStatic
    at ./internal/media/processingemoji.go:167
Sending output to pager...
 0  0x00000000004412bd in runtime.gopark
    at /home/blackle/Downloads/go/src/runtime/proc.go:364
 1  0x000000000044134a in runtime.goparkunlock
    at /home/blackle/Downloads/go/src/runtime/proc.go:369
 2  0x00000000004525fd in runtime.semacquire1
    at /home/blackle/Downloads/go/src/runtime/sema.go:150
 3  0x000000000046f845 in sync.runtime_SemacquireMutex
    at /home/blackle/Downloads/go/src/runtime/sema.go:77
 4  0x000000000049f4d1 in sync.(*Mutex).lockSlow
    at /home/blackle/Downloads/go/src/sync/mutex.go:171
 5  0x000000000049f191 in sync.(*Mutex).Lock
    at /home/blackle/Downloads/go/src/sync/mutex.go:90
 6  0x000000000232f4ff in codeberg.org/gruf/go-mutexes.(*MutexMap).spinLock
    at ./vendor/codeberg.org/gruf/go-mutexes/map.go:201
 7  0x000000000232f6a7 in codeberg.org/gruf/go-mutexes.(*MutexMap).lock
    at ./vendor/codeberg.org/gruf/go-mutexes/map.go:212
 8  0x0000000002330445 in codeberg.org/gruf/go-mutexes.(*MutexMap).Lock
    at ./vendor/codeberg.org/gruf/go-mutexes/map.go:349
 9  0x0000000002342947 in codeberg.org/gruf/go-store/kv.(*KVStore).Lock
    at ./vendor/codeberg.org/gruf/go-store/kv/store.go:60
10  0x000000000234476d in codeberg.org/gruf/go-store/kv.(*KVStore).Lock-fm
    at <autogenerated>:1
11  0x00000000023432a4 in codeberg.org/gruf/go-store/kv.(*KVStore).put
    at ./vendor/codeberg.org/gruf/go-store/kv/store.go:104
12  0x00000000023431b0 in codeberg.org/gruf/go-store/kv.(*KVStore).Put
    at ./vendor/codeberg.org/gruf/go-store/kv/store.go:99
13  0x0000000002b75e51 in github.com/superseriousbusiness/gotosocial/internal/storage.(*Local).Put
    at ./internal/storage/local.go:51
14  0x0000000002c364c3 in github.com/superseriousbusiness/gotosocial/internal/media.(*ProcessingEmoji).loadStatic
    at ./internal/media/processingemoji.go:167
15  0x0000000002c3568e in github.com/superseriousbusiness/gotosocial/internal/media.(*ProcessingEmoji).LoadEmoji
15  0x0000000002c3568e in github.com/superseriousbusiness/gotosocial/internal/media.(*ProcessingEmoji).LoadEmoji
    at ./internal/media/processingemoji.go:99
16  0x0000000002cab285 in github.com/superseriousbusiness/gotosocial/internal/federation/dereferencing.(*deref).populateEmojis
    at ./internal/federation/dereferencing/emoji.go:94
17  0x0000000002ca6f7d in github.com/superseriousbusiness/gotosocial/internal/federation/dereferencing.(*deref).fetchRemoteAccountEmojis
    at ./internal/federation/dereferencing/account.go:630
18  0x0000000002ca4f3c in github.com/superseriousbusiness/gotosocial/internal/federation/dereferencing.(*deref).populateAccountFields
    at ./internal/federation/dereferencing/account.go:446
19  0x0000000002ca1d32 in github.com/superseriousbusiness/gotosocial/internal/federation/dereferencing.(*deref).GetRemoteAccount
    at ./internal/federation/dereferencing/account.go:330
20  0x0000000002cc3e09 in github.com/superseriousbusiness/gotosocial/internal/federation.(*federator).GetRemoteAccount
    at ./internal/federation/dereference.go:31
21  0x0000000002cc76d4 in github.com/superseriousbusiness/gotosocial/internal/federation.(*federator).AuthenticatePostInbox
    at ./internal/federation/federatingprotocol.go:201
22  0x00000000020f9c19 in github.com/superseriousbusiness/activity/pub.(*sideEffectActor).AuthenticatePostInbox

if I understand the KVStore stuff properly, each key (filename) gets its own mutex. so it seems that locked keys are not getting unlocked when they should be.

blackle commented 1 year ago

goroutines_after_timeline_lockup.txt

here's a stack trace (generated with ctrl+\) after the timeline had been locked up for ~38 minutes. might be useful when someone looks at it. I'm trying to parse what's going on with https://github.com/linuxerwang/goroutine-inspect

tsmethurst commented 1 year ago

From a quick look at that text file, it seems like a recurring theme among the long running goroutines is github.com/superseriousbusiness/gotosocial/internal/media.(*manager).ProcessEmoji so that's where I'll investigate :) probably I am not closing something properly there...

NyaaaWhatsUpDoc commented 1 year ago

@blackle the per-key locks are handled by mutexes.MutexMap, where each key gets a pseudo-mutex (basically just storing a read/write lock state integer) stored behind a singular mutex protecting the whole map. The actual mutexes that they block on are just used as a means to sleep those threads until the mutexes.MutexMap is in a "usable" state for them, instead of having them constantly looping and putting lots of contention in the singular MutexMap mutex. The unlocks for these mutexes (mutices??? :p) are returned as functions, which get wrapped in callbacks for the kv.KVStore io.Closer implementations. And this theory of unclosed streams also ties well with some previous log entries we've seen of "too many open files".

Though I will dig deeper into those goroutine dumps you sent to check this library isn't the cause here.

NyaaaWhatsUpDoc commented 1 year ago

I took a look and shared a potential fix with @tsmethurst today. I'm quite busy at work at the moment so not sure I'll get time to handle it myself, but tobi if you do get time to work on it let me know and I'll review the PR :)

blackle commented 1 year ago

still reproducing this with that latest PR. here's a dump of the goroutines after it had been locked up for about 15 mins goroutines_after_timeline_lockup_2.txt

NyaaaWhatsUpDoc commented 1 year ago

Thanks for the update! I ran into a lockup myself yesterday even after merging that PR, so for the last 12hrs my instance has been running a quick patch I threw together that causes the instance to panic and dump all goroutines if the store/db locks are held open for longer than a minute, or if a queued worker function executes for longer than 5 minutes. It hasn't seemed to panic yet though 🤔

If I share the diff would you be happy running this patch yourself?

tsmethurst commented 1 year ago

Judging from the logs @blackle uploaded, my bet would be that it's these mutexes which are being competed for: https://github.com/superseriousbusiness/gotosocial/blob/main/internal/media/processingemoji.go#L45 https://github.com/superseriousbusiness/gotosocial/blob/main/internal/media/processingmedia.go#L43

and that's likely because some other lock acquired after that lock is not being properly released... I've scoured the code with my silly eyes but haven't seen anything yet, but at least we know now roughly in what area we should be looking

NyaaaWhatsUpDoc commented 1 year ago

Yeah I had a look earlier before work and came to a similar conclusion, though I'm wondering if it's a case of lockup between the processing emoji / media and the store per-key mutex? I know for certain that we acquire the processing object lock, then within that attempt to acquire a write lock when calling store.PutStream(). Could there be a case where in another code location there could be a read lock open on that store key (so it has a store.GetStream() currently open for the same path), and it is then attempting to acquire the processing object lock?

tsmethurst commented 1 year ago

Could there be a case where in another code location there could be a read lock open on that store key

It could be possible I think, if there are multiple callers to LoadEmoji or LoadAttachment :thinking:

tsmethurst commented 1 year ago

I opened this just so we're using your latest code kimb: https://github.com/superseriousbusiness/gotosocial/pull/953

NyaaaWhatsUpDoc commented 1 year ago

We think we have it narrowed down to some logic within the handling of boosts after I finally managed to get my own instance to fall into my "traps". There's a possibility it could be some dodgy code while iterating descendants of a dereferenced boost, so I have fine-tuned my programmatic C4 that is panic-timeouts and I'm waiting on further information.

For reference, this was the result of the previous panic-timeout:

goroutine 2493459 [running]:
runtime/debug.Stack()
        runtime/debug/stack.go:24 +0x65
github.com/superseriousbusiness/gotosocial/internal/concurrency.(*WorkerPool[...]).Queue(0xc000586960, {{0x22553f2, 0x8}, {0x224e01e, 0x6}, 0x0, {0x204d440, 0xc0003f7180}, 0xc000b94300})
        github.com/superseriousbusiness/gotosocial/internal/concurrency/workers.go:122 +0x159
github.com/superseriousbusiness/gotosocial/internal/federation/federatingdb.(*federatingDB).Announce(0xc0005141e0, {0x2773d28, 0xc004e1a7e0}, {0x279e678?, 0xc0005adb80})
        github.com/superseriousbusiness/gotosocial/internal/federation/federatingdb/announce.go:61 +0x29d
github.com/superseriousbusiness/gotosocial/internal/federation.(*federator).FederatingCallbacks.func4({0x2773d28?, 0xc004e1a7e0?}, {0x279e678?, 0xc0005adb80?})
        github.com/superseriousbusiness/gotosocial/internal/federation/federatingprotocol.go:356 +0x4a
github.com/superseriousbusiness/activity/streams.TypeResolver.Resolve({{0xc001f54d00?, 0xc?, 0x10?}}, {0x2773d28, 0xc004e1a7e0}, {0x7f1708572030, 0xc0005adb80})
        github.com/superseriousbusiness/activity@v1.2.1-gts/streams/gen_type_resolver.go:204 +0x2c7c
github.com/superseriousbusiness/activity/pub.(*sideEffectActor).PostInbox(0xc00003e5a0, {0x2773d28, 0xc004e1a7e0}, 0xc0012accf0, {0x7f1708571ed0?, 0xc0005adb80})
        github.com/superseriousbusiness/activity@v1.2.1-gts/pub/side_effect_actor.go:128 +0x34d
github.com/superseriousbusiness/activity/pub.(*baseActor).PostInboxScheme(0xc000dc8870, {0x2773d28, 0xc004e1a2d0}, {0x7f170824c520, 0xc003652dc8}, 0xc001f54900, {0x224c32e, 0x5})
        github.com/superseriousbusiness/activity@v1.2.1-gts/pub/base_actor.go:239 +0x57a
github.com/superseriousbusiness/activity/pub.(*baseActor).PostInbox(0x30?, {0x2773d28?, 0xc004e1a2d0?}, {0x7f170824c520?, 0xc003652dc8?}, 0xc004e1a240?)
        github.com/superseriousbusiness/activity@v1.2.1-gts/pub/base_actor.go:169 +0x3a
github.com/superseriousbusiness/gotosocial/internal/federation.(*federatingActor).PostInbox(0x1f7db20?, {0x2773d28?, 0xc004e1a2d0?}, {0x7f170824c520?, 0xc003652dc8?}, 0x2773d28?)
        github.com/superseriousbusiness/gotosocial/internal/federation/federatingactor.go:79 +0x35
github.com/superseriousbusiness/gotosocial/internal/processing/federation.(*processor).PostInbox(0x1f7db20?, {0x2773d28, 0xc004e1a2d0}, {0x7f170824c520, 0xc003652dc8}, 0xc004e1a200?)
        github.com/superseriousbusiness/gotosocial/internal/processing/federation/postinbox.go:27 +0x62
github.com/superseriousbusiness/gotosocial/internal/processing.(*processor).InboxPost(0xc000870c00?, {0x2773d28?, 0xc004e1a2d0?}, {0x7f170824c520?, 0xc003652dc8?}, 0x0?)
        github.com/superseriousbusiness/gotosocial/internal/processing/federation.go:71 +0x3c
github.com/superseriousbusiness/gotosocial/internal/api/s2s/user.(*Module).InboxPOSTHandler(0xc0001e9b90, 0xc000870c00)
        github.com/superseriousbusiness/gotosocial/internal/api/s2s/user/inboxpost.go:41 +0xe8
github.com/gin-gonic/gin.(*Context).Next(0xc000870c00)
        github.com/gin-gonic/gin@v1.8.1/context.go:173 +0x2d
github.com/superseriousbusiness/gotosocial/internal/api/security.(*Module).TokenCheck(0xc00037e420, 0xc000870c00)
        github.com/superseriousbusiness/gotosocial/internal/api/security/tokencheck.go:40 +0x2db
github.com/gin-gonic/gin.(*Context).Next(...)
        github.com/gin-gonic/gin@v1.8.1/context.go:173
github.com/ulule/limiter/v3/drivers/middleware/gin.(*Middleware).Handle(0xc000dc9ef0, 0xc000870c00)
        github.com/ulule/limiter/v3@v3.10.0/drivers/middleware/gin/middleware.go:64 +0x1f4
github.com/ulule/limiter/v3/drivers/middleware/gin.NewMiddleware.func1(0xc000870c00?)
        github.com/ulule/limiter/v3@v3.10.0/drivers/middleware/gin/middleware.go:35 +0x25
github.com/gin-gonic/gin.(*Context).Next(...)
        github.com/gin-gonic/gin@v1.8.1/context.go:173
github.com/gin-contrib/sessions.Sessions.func1(0xc000870c00)
        github.com/gin-contrib/sessions@v0.0.5/sessions.go:54 +0x19a
github.com/gin-gonic/gin.(*Context).Next(...)
        github.com/gin-gonic/gin@v1.8.1/context.go:173
github.com/gin-contrib/gzip.(*gzipHandler).Handle(0xc0007c7ec0, 0xc000870c00)
        github.com/gin-contrib/gzip@v0.0.6/handler.go:60 +0x302
github.com/gin-gonic/gin.(*Context).Next(...)
        github.com/gin-gonic/gin@v1.8.1/context.go:173
github.com/superseriousbusiness/gotosocial/internal/router.loggingMiddleware(0xc000870c00)
        github.com/superseriousbusiness/gotosocial/internal/router/logger.go:95 +0x102
github.com/gin-gonic/gin.(*Context).Next(...)
        github.com/gin-gonic/gin@v1.8.1/context.go:173
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0xc000102b60, 0xc000870c00)
        github.com/gin-gonic/gin@v1.8.1/gin.go:616 +0x671
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0xc000102b60, {0x2772770?, 0xc0039d4460}, 0xc001f54900)
        github.com/gin-gonic/gin@v1.8.1/gin.go:572 +0x1dd
codeberg.org/gruf/go-debug.WithPprof.func1({0x2772770?, 0xc0039d4460?}, 0xc001f54900?)
        codeberg.org/gruf/go-debug@v1.2.0/pprof_on.go:61 +0xf4
net/http.HandlerFunc.ServeHTTP(0x0?, {0x2772770?, 0xc0039d4460?}, 0x77f3b4?)
        net/http/server.go:2109 +0x2f
net/http.serverHandler.ServeHTTP({0x276d2e8?}, {0x2772770, 0xc0039d4460}, 0xc001f54900)
        net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc004c9e3c0, {0x2773d28, 0xc00069b530})
        net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
        net/http/server.go:3102 +0x4db

panic: processing worker func for over 5 minutes

goroutine 2497872 [running]:
github.com/superseriousbusiness/gotosocial/internal/concurrency.(*WorkerPool[...]).Queue.func1.1.1()
        github.com/superseriousbusiness/gotosocial/internal/concurrency/workers.go:126 +0x2a
created by github.com/superseriousbusiness/gotosocial/internal/concurrency.(*WorkerPool[...]).Queue.func1.1
        github.com/superseriousbusiness/gotosocial/internal/concurrency/workers.go:126 +0x9e

So from this, the last thing we know it does before hanging is passing a message to the federator worker queue with a create-announce. That then gets passed to the processor which handles it in processCreateAnnounceFromFederator()

The latest iteration of my "programmatic C4" is the following:

diff --git a/internal/concurrency/workers.go b/internal/concurrency/workers.go
index 279a0c3c..fcf74899 100644
--- a/internal/concurrency/workers.go
+++ b/internal/concurrency/workers.go
@@ -22,9 +22,12 @@ import (
        "context"
        "errors"
        "fmt"
+       "os"
        "path"
        "reflect"
        "runtime"
+       "runtime/debug"
+       "time"

        "codeberg.org/gruf/go-runners"
        "github.com/superseriousbusiness/gotosocial/internal/log"
@@ -116,7 +119,19 @@ func (w *WorkerPool[MsgType]) Queue(msg MsgType) {
        log.Tracef("%s queueing message (queue=%d): %+v",
                w.prefix, w.workers.Queue(), msg,
        )
+       stack := debug.Stack()
        w.workers.Enqueue(func(ctx context.Context) {
+               t := time.AfterFunc(time.Minute*5, func() {
+                       fmt.Fprintf(os.Stderr, "\n\n%s\n\n", stack)
+                       go panic("processing worker func for over 5 minutes")
+               })
+
+               defer func() {
+                       if !t.Stop() {
+                               <-t.C
+                       }
+               }()
+
                if err := w.process(ctx, msg); err != nil {
                        log.Errorf("%s %v", w.prefix, err)
                }
diff --git a/internal/federation/dereferencing/thread.go b/internal/federation/dereferencing/thread.go
index 34955cee..9d4d8167 100644
--- a/internal/federation/dereferencing/thread.go
+++ b/internal/federation/dereferencing/thread.go
@@ -22,8 +22,13 @@ import (
        "context"
        "fmt"
        "net/url"
+       "os"
+       "runtime/debug"
+       "time"

+       "codeberg.org/gruf/go-byteutil"
        "codeberg.org/gruf/go-kv"
+       "codeberg.org/gruf/go-kv/format"
        "github.com/superseriousbusiness/activity/streams/vocab"
        "github.com/superseriousbusiness/gotosocial/internal/ap"
        "github.com/superseriousbusiness/gotosocial/internal/config"
@@ -189,6 +194,20 @@ func (d *deref) dereferenceStatusDescendants(ctx context.Context, username strin
                }
        )

+       stackDump := debug.Stack()
+       t := time.AfterFunc(time.Minute*5, func() {
+               var buf byteutil.Buffer
+               format.Appendf(&buf, "{:?}", stack)
+               fmt.Fprintf(os.Stderr, "\n\n%s\n\n%s\n\n", stackDump, buf.String())
+               go panic("processing worker func for over 5 minutes")
+       })
+
+       defer func() {
+               if !t.Stop() {
+                       <-t.C
+               }
+       }()
+
 stackLoop:
        for i := 0; i < maxIter; i++ {
                // Pop next frame, nil means we are at end
zladuric commented 1 year ago

Hey, so I have all of these "wrote more than the declared content length" etc things, also like 20% of my docker logs -f is various 401 errors. E.g.

timestamp="07/11/2022 20:31:17.673" func=concurrency.(*WorkerPool).Queue.func1 level=ERROR msg="worker.Worker[messages.FromFederator] populateStatusFields: error populating status repliedTo: populateStatusRepliedTo: couldn't get reply to status with uri https://digipres.club/XXXXX: GetRemoteStatus: error dereferencing statusable: DereferenceStatusable: error deferencing https://digipres.club/XXXXX: GET request to https://digipres.club/XXXXX failed (404): 404 Not Found"
timestamp="07/11/2022 20:35:11.334" func=dereferencing.(*deref).populateStatusAttachments level=ERROR msg="populateStatusAttachments: couldn't load remote attachment https://cdn.masto.host/climatejusticeglobal/XXXXX: store: media type video/mp4 not (yet) supported"
timestamp="07/11/2022 20:35:11.354" func=concurrency.(*WorkerPool).Queue.func1 level=ERROR msg="worker.Worker[*media.ProcessingMedia] error loading media XXXXX: store: media type video/mp4 not (yet) supported"
timestamp="07/11/2022 20:36:31.179" func=dereferencing.(*deref).populateStatusAttachments level=ERROR msg="populateStatusAttachments: couldn't load remote attachment https://cdn.masto.host/XXXXX: store: media type video/mp4 not (yet) supported"
timestamp="07/11/2022 20:36:31.200" func=concurrency.(*WorkerPool).Queue.func1 level=ERROR msg="worker.Worker[*media.ProcessingMedia] error loading media XXXXX: store: media type video/mp4 not (yet) supported"
timestamp="07/11/2022 20:37:45.781" func=dereferencing.(*deref).populateStatusAttachments level=ERROR msg="populateStatusAttachments: couldn't load remote attachment https://cdn.masto.host/XXXXX.mp4: store: media type video/mp4 not (yet) supported"
timestamp="07/11/2022 20:37:45.795" func=concurrency.(*WorkerPool).Queue.func1 level=ERROR msg="worker.Worker[*media.ProcessingMedia] error loading media XXXXX: store: media type video/mp4 not (yet) supported"
timestamp="07/11/2022 20:38:52.261" func=concurrency.(*WorkerPool).Queue.func1 level=ERROR msg="worker.Worker[messages.FromFederator] populateStatusFields: error populating status repliedTo: populateStatusRepliedTo: couldn't get reply to status with uri https://mastodon.art/XXXXX: GetRemoteStatus: error dereferencing statusable: DereferenceStatusable: error deferencing https://mastodon.art/XXXXX: GET request to https://mastodon.art/users/XXXXX/statuses/XXXXX failed (404): 404 Not Found"
imestamp="07/11/2022 20:41:46.825" func=web.(*Module).assetsCacheControlMiddleware.func1 level=ERROR msg="error getting ETag for /dist/bundle.js: error opening /dist/bundle.js: open /gotosocial/web/assets/dist/bundle.js: no such file or directory"
timestamp="07/11/2022 20:44:22.848" func=transport.(*transport).do level=ERROR pubKeyID="https://social.zlatko.dev/users/zlatko/main-key" method="GET" url="https://mastodon.scot/users/XXXXX#main-key" msg="backing off for 2s after http request error: Get \"https://mastodon.scot/users/XXXXX#main-key\": EOF"

Or these where people seem to be trying to post to my inbox:

timestamp="07/11/2022 20:46:36.221" func=router.loggingMiddleware.func1 level=INFO latency="20.784208ms" clientIP="XXXXX" userAgent="http.rb/5.1.0 (Mastodon/4.0.0rc1; +https://ravenation.club/)" method="POST" statusCode=401 path="/users/zlatko/inbox" msg="Unauthorized: wrote 23B"
timestamp="07/11/2022 20:46:39.960" func=router.loggingMiddleware.func1 level=INFO latency="32.117344ms" clientIP="XXXXX" userAgent="http.rb/5.1.0 (Mastodon/4.0.0rc2; +https://pouet.it/)" method="POST" statusCode=401 path="/users/zlatko/inbox" msg="Unauthorized: wrote 23B"
timestamp="07/11/2022 20:46:42.727" func=router.loggingMiddleware.func1 level=INFO latency="118.280708ms" clientIP="XXXXX" userAgent="http.rb/5.1.0 (Mastodon/4.0.0rc1; +https://mastodon.online/)" method="POST" statusCode=401 path="/users/zlatko/inbox" msg="Unauthorized: wrote 23B"

Is any of this related? Because I too feel like my timeline is getting stuck a lot.

tsmethurst commented 1 year ago

I haven't seen the bug again yet since the above linked PR so I'm hopeful it's been resolved. Will wait a few days first and see.

zladuric commented 1 year ago

Just a quick question, that'd be :latest docker tag in my dockerfile, correct?

tsmethurst commented 1 year ago

This seems to have been resolved :) hurray! I'm going to close the issue