ssbc / ssb-threads

Scuttlebot plugin for fetching messages as threads
MIT License
21 stars 5 forks source link

pagination #15

Closed nichoth closed 2 years ago

nichoth commented 2 years ago

Is there a way to paginate this query so that it only returns something like the first 10 threads/thread-summaries?

        const source = sbot.threads.publicSummary({
            reverse: true,
            allowlist: ['post']
        })

Intuitively I would do something like this, but when doing this it seems to use too much memory and takes a long time to return.

        S(
            source,
            S.take(10),
            S.drain(/*...*/)
staltz commented 2 years ago

Ooooh I just realized that there is a chance you're running into deweird issues. Do you happen to have a thread or "worker" process or something like that in between the HTML rendering and the ssb-threads query?

Technically the take(10) is the correct solution and I don't know why it would take too much memory. Would require profiling and debugging to figure it out. I would much rather fix the root cause of this performance issue than find another way to query the API.

nichoth commented 2 years ago

Thanks @staltz for the tip about ssb-deweird. That will be the next thing I look into. There is no separate process or anything, and no html rendering either; this is all happening in node, just returning results via an HTTP API.

nichoth commented 2 years ago

update

I tried using deweird/producer but still having the out of memory error -- https://github.com/planetary-social/planetary-pub/blob/deweird/pub.js#L148

This is using an sbot that is in the same process as the 'consumer', so I think the weirdness is not an issue here.

also

I was using this like

        var source = sbot.threads.profile({
            id: userId,
        })

and it was working.

But when i changed it to

        var source = sbot.threads.profile({
            id: userId,
            allowlist: ['post']
        })

then it crashes/runs out of memory

staltz commented 2 years ago

Yeah, just installing deweird/producer will have zero effect on it.

Can you try to use a profiler to get more information on the memory leak and the performance problems? Like if you start node.js with --inspect-brk and use chrome devtools profiler.

nichoth commented 2 years ago

@staltz

I did get a debugger to run with the program via --inspect-brk and then opening chrome://inspect/#devices . However I'm not sure how best to communicate the information

image

It is displaying a function mergeFilters in jitdb/index.js

Sorry I have never debugged in this way -- async and remotely -- before

nichoth commented 2 years ago

This is the repo in question --

https://github.com/planetary-social/planetary-pub/blob/out-of-mem/viewer/index.js#L84

staltz commented 2 years ago

Out of curiosity (and because this could be the cause of the OOM), what is the size of the log? I mean db2/log.bipf file.

nichoth commented 2 years ago

816 MB

total 1670272
drwxr-xr-x 25 nick staff 800B Feb 7 16:10 indexes
-rw-r--r-- 1 nick staff 816M Feb 7 16:09 log.bipf
staltz commented 2 years ago

Is this running on a VPN? What is the RAM capacity of the machine?

nichoth commented 2 years ago

more clues

This is not a VPN. This is just on my local laptop machine.

I had been starting this with a limit on memory use of 512 MB like so --NODE_ENV=staging-local node --max-old-space-size=512 index.js . That would crash.

Then I tried starting it without a limit on memory -- NODE_ENV=staging-local node index.js . My machine has 8GB of memory. These results are more complicated. The first request I made to the endpoint would take a long time, but eventually it would return successfully. Then if I make a second request to that endpoint it will run out of memory and crash.

This is the endpoint

    fastify.get('/feed-by-id/:userId', (req, res) => {
        var { userId } = req.params
        var source = sbot.threads.profile({
            id: userId,
            allowlist: ['post'],
            threadMaxSize: 3 // at most 3 messages in each thread
        })

        S(
            source,
            S.take(10),
            S.map(thread => {
                // if it's a thread, return the thread
                // if not a thread, return a single message (not array)
                return thread.messages.length > 1 ?
                    thread.messages :
                    thread.messages[0]
            }),
            S.collect(function (err, threads) {
                if (err) return console.log('err', err)
                res.send(threads)
            })
        )
    })
staltz commented 2 years ago

This conversation can keep on going, but there's nothing actionable for a maintainer to do, so I'll close.