Closed staltz closed 2 years ago
It can't be the transaction support as that is a new function that you don't use yet. So must be the bendy butt PR. Was this related to or was it just normal usage of manyverse?
I think it waa unrelated to that issue, and it was just normal usage of Manyverse. Just opened the app and without doing anything, in about 15sec it would crash.
Could it be something about scanning the log and rebuilding the author index? (Note the prefixOffset hack we did)
Could be yes
I went back to study this issue, and the symptoms are still the same with ssb-db2 2.8.3
. The crash is clearly mentioning something about "out of memory" on both Android and desktop (Electron, Linux).
Here's a report from Electron DevTools:
mergeFilters
is at the very topmatchAgainstPrefix
is also expensivepull.collect
which is expected to be heavy anyway since it loads the whole social graph into memoryloadPrefixMapFile
which is expected to be heavy too, since it does a JSON.parse on a large dataexports.reverse
is in ssb-friends or layered-graphSo the big issue is unequivocally JITDB's mergeFilters
. On the other hand, the previous JITDB version in Manyverse is 3.5.0 also has mergeFilters
and matchAgainstPrefix
but it doesn't have a memleak. Here is the git diff between jitdb 3.5.0 and 4.0.0: https://github.com/ssb-ngi-pointer/jitdb/compare/v.3.5.0..v4.0.0
The only thing that might have changed the logic is https://github.com/ssb-ngi-pointer/jitdb/compare/ssb-ngi-pointer:e004c1a...ssb-ngi-pointer:41a1b32#diff-e727e4bdf3657fd1d798edcd6b099d6e092f8573cba266154583a746bba0f346R850 which is now using bipf.compare
.
That said, when I filed this issue (Oct 1), the latest version of jitdb was 3.4.0 which did not have bipf.compare
in matchAgainstPrefix
: https://github.com/ssb-ngi-pointer/jitdb/blob/e0744c187c193999e9c2f861ee8e6e9b547900a3/index.js#L856 and even back then there was a memleak somewhere.
I ran a couple other experiments:
matchAgainstPrefix
(see red section here)
matchAgainstPrefix
so I looked at which JITDB queries were running often, and from the debug
logs I could see there were related to ssb-threads
, so I turned off all Private tab queries, and Mentions tab queries
Of course, with the public tab query turned off, the app is useless, AND there are typically other computations that start when the public tab renders data (such as ssb-conn
is kickstarted), but I currently suspect it has something to do with this ssb-db2 query: https://github.com/ssbc/ssb-threads/blob/22170c3fde66cbdd4f7615a3b94ab2871209b0c3/src/index.ts#L188-L193
(UPDATE: I turned on ssb-conn
's "start" and it did not cause a memleak crash, so this confirms further that it may be about ssb-threads
publicSummary
)
It logs in the terminal usually like this:
jitdb:query all(AND(*,SEQS(1371356))): 6ms, total messages: 1 +15ms
jitdb:query paginate(AND(*,AND(OR(value_content_root(%Xrmwkht),value_content_fork(%Xrmwkht)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 1352ms, total messages: 0 +1s
jitdb:query all(AND(*,SEQS(1371219))): 2ms, total messages: 1 +26ms
jitdb:query paginate(AND(*,AND(OR(value_content_root(%vrwiHED),value_content_fork(%vrwiHED)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 1351ms, total messages: 1 +1s
jitdb:query all(AND(*,SEQS(1371273))): 1ms, total messages: 1 +301ms
jitdb:query paginate(AND(*,AND(OR(value_content_root(%RadDbNO),value_content_fork(%RadDbNO)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 921ms, total messages: 0 +947ms
jitdb:query all(AND(*,SEQS(1371301))): 2ms, total messages: 1 +51ms
jitdb:query paginate(AND(*,AND(OR(value_content_root(%nRyWqKu),value_content_fork(%nRyWqKu)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 951ms, total messages: 0 +1s
jitdb:query all(AND(*,SEQS(1371265))): 2ms, total messages: 1 +37ms
jitdb:query paginate(AND(*,AND(OR(value_content_root(%cgs6CeR),value_content_fork(%cgs6CeR)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 928ms, total messages: 0 +948ms
jitdb:query all(AND(*,SEQS(1371264))): 1ms, total messages: 1 +108ms
jitdb:query paginate(AND(*,AND(OR(value_content_root(%0Ulq+Lh),value_content_fork(%0Ulq+Lh)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 929ms, total messages: 0 +937ms
jitdb:query all(AND(*,SEQS(1371290))): 2ms, total messages: 1 +214ms
jitdb:query paginate(AND(*,AND(OR(value_content_root(%oZTNlUm),value_content_fork(%oZTNlUm)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 2231ms, total messages: 3 +2s
jitdb:query all(AND(*,SEQS(1371246))): 1ms, total messages: 1 +465ms
jitdb:query paginate(AND(*,AND(OR(value_content_root(%3QsVgTQ),value_content_fork(%3QsVgTQ)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 1116ms, total messages: 0 +1s
jitdb:query all(AND(*,SEQS(1371263))): 2ms, total messages: 1 +2s
jitdb:query paginate(AND(*,AND(OR(value_content_root(%Lc98Jti),value_content_fork(%Lc98Jti)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 878ms, total messages: 0 +925ms
jitdb:query all(AND(*,SEQS(1370941))): 3ms, total messages: 1 +42ms
jitdb:query paginate(AND(*,AND(OR(value_content_root(%lDlTx1g),value_content_fork(%lDlTx1g)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 889ms, total messages: 2 +893ms
jitdb:query all(AND(*,SEQS(1371057))): 3ms, total messages: 1 +3s
jitdb:query paginate(AND(*,AND(OR(value_content_root(%XxTlFZX),value_content_fork(%XxTlFZX)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 979ms, total messages: 0 +983ms
<--- Last few GCs --->
[467783:0xb880042c000] 63018 ms: Scavenge 1911.5 (1964.8) -> 1909.6 (1964.8) MB, 1.8 / 0.0 ms (average mu = 0.265, current mu = 0.182) task
[467783:0xb880042c000] 63038 ms: Scavenge 1911.5 (1964.8) -> 1909.6 (1964.8) MB, 1.4 / 0.0 ms (average mu = 0.265, current mu = 0.182) allocation failure
[467783:0xb880042c000] 63093 ms: Scavenge 1911.4 (1964.8) -> 1909.7 (1964.8) MB, 1.7 / 0.0 ms (average mu = 0.265, current mu = 0.182) task
<--- JS stacktrace --->
[467783:1222/162520.851176:ERROR:node_bindings.cc(143)]
Fatal error in V8: Ineffective mark-compacts near heap limit
Allocation failed - JavaScript heap out of memory
OMG I think I figured it out! GOSH programming is so painful sometimes.
I'm happy I made the thorough analysis above, because it helped narrow down that it may be in publicSummary
and it has something to do with the new code shown in the diff between the ssb-db2 versions.
So I took a careful look at the changes to the operators, like type
:
// | THE OPTS ARG IS NEW STUFF
// |
// v
function type(value, opts = { dedicated: true }) {
if (opts && opts.dedicated) {
return equal(seekType, value, {
indexType: 'value_content_type',
})
} else {
return equal(seekType, value, {
prefix: 32,
indexType: 'value_content_type',
})
}
}
And then makeFilterOperator in ssb-threads:
function makeFilterOperator(opts: FilterOpts): any {
if (opts.allowlist) {
// | THIS NASTY .map
// |
// v
const allowedTypes = opts.allowlist.map(type);
return or(...allowedTypes);
}
if (opts.blocklist) {
const blockedTypes = opts.blocklist.map((x) => not(type(x)));
return and(...blockedTypes);
}
return null;
}
The .map
calls the type()
operator with type(str, i, thisObj)
and then we're passing i
as the operator opts
, which ends up creating a prefix operator, which is not what we want.
Turned out I just changed
- const allowedTypes = opts.allowlist.map(type);
+ const allowedTypes = opts.allowlist.map(x => type(x));
And no more memory leak crash.
SO PAINFUL.
Wow, nice debugging! :) That should things overall also smoother I would imagine.
Closing this issue because it's an ssb-threads issue and I solved it there already. Thanks for reading/following, I know it probably took some non-zero minutes of your attention.
I tried to use ssb-db2 2.6.0 in Manyverse and experienced a consistent memory leak that ends up using all the memory and crashing the app.
I then pinned the version to 2.3.1 and no memory leak happened.
I'm not sure where exactly is the culprit, it could be related to bendy-butt additions (in version 2.4.0 and above) and/or transactions,
or maybe it's caused by one of the dependencies (async-append-only-log or jitdb or something else).Update: actually when setting the ssb-db2 to2.3.1
, it also updated async-append-only-log to3.1.0
(that's the latest) and jitdb to3.4.0
(latest) and the mem leak didn't happen, so it has to be something about ssb-db2 specifically.