Closed staltz closed 3 years ago
This is just the migration right? Does this cost impact initial sync? If not I would not worry too much.
Main question would be how long does this take if manyverse gets put to sleep ) :
I would be fine with it taking an hour IF I was told that manyverse will be doing that, and to plug my phone in and set it going when I'm not needing manyverse. Bonus points for progress bar....
Yeah it's just migration and will only happen once. For new users who don't have an old database it won't happen
I'm trying to understand why it is so slow first. Is this with any db2 indexes running while it is migrating or only migrating the log?
I tried both (and it wasn't faster), but yes, there are level and jit indexes building at the same time, on both desktop and mobile.
I'll have a look at migrate and see if I can spot something
On my laptop with power connected. Running migrate with default options.
Progress overhead: 2-5s
Edit: ran it a few times, the overhead is not so bad.
Disabled base + key index: 33s. Compared to around 58s before.
Running them afterwards is 23s
I tried maxCpu & afterwards with maxPause as well, like this:
const SecretStack = require('secret-stack')
const caps = require('ssb-caps')
const path = require('path')
const sbot = SecretStack({
caps,
})
.use(require('./'))
.use(require('./migrate'))
.call(null, {
path: '/home/arj/.ssb',
keys: require('ssb-keys').loadOrCreateSync(path.join('/home/arj/.ssb', 'secret')),
db2: {
automigrate: true,
maxCpu: 98,
maxPause: 180
}
})
Didn't seem to change overall performance. Still around 60s with indexes.
Yeah, it might be you can only reproduce this on phones.
Strangely, migrate got even slower with the latest changes (version 1.17.1).
Without maxCpu: 3h With maxCpu=98: 21h
So on mobile it is 180 minutes. Compared to 1 minute on my laptop. I could understand maybe 5 or 10, but 180x?
Yeah, I don't understand it
I could try other phones I have. It could be there's something weird about the Fairphone 3's disk.
Digging deeper, I put some performance measurements in the hot path of the migrate plugin:
pull(
oldLog.getStream({ gt: migratedOffset }),
// pull.through to measure "get from old log" duration
pull.map(updateMigratedSizeAndPluck),
// pull.through to measure "update migr size" duration
pull.map(toBIPF),
// pull.through to measure "convert to bipf" duration
pull.asyncMap(writeToNewLog),
// pull.through to measure "write to new log" duration
(drainAborter = config.db2.maxCpu
? drainGently(tooHotOpts(config), op, opDone)
: pull.drain(op, opDone))
)
And these are the logs that I got. Basically writing to disk is the bottleneck. Would be better to have nanosecond precision on those, to validate the 180x assumption. PS: in both cases there was no maxCpu set.
Can you test these 3 lines individually to see which one is slow?
newLog.append(data, () => {})
emitProgressEvent()
if (dataTransferred % BLOCK_SIZE === 0) newLog.onDrain(cb)
Wow :sweat_smile: Something wrong with emitProgressEvent
Nice find! Glad it was that one :smile:
By the way, it's indexing all indexes now in 5min, which is a massive improvement over the 25min I got earlier. And this is concurrently with migrate running too (in "inefficient mode" even).
I'll report later what it looks like with migrate not running.
Wup wup, really glad we found the bugger. It couldn't hide forever :-)
I still need some way of reporting migration progress, but I'll take a performance-oriented approach where I'm running these benchmarks on the phone, making sure it's good, and only then submitting a PR that passes tests.
Yeah, you could try just emitting progress every 1000 instead
Technically, that's already the case, but the 1000 check is deep inside emitProgressEvent(), instead, I could put the check outside of it.
Initial indexing (everything) took 6m35s without migrate running. I think that's a pretty awesome achievement compared to the many hours Manyverse with flumedb needed previously.
Awesome, full rebuild of the db2 folder (migrate and indexes) took 9m34s. No maxCpu. And strangely, UI responsiveness was good, first time I have that with maxCpu turned off. If that's consistently true, I could remove maxCpu support from the codebase, but not yet. Let it stay there just in case. I'm very happy that we have this result and not the 3h or 22h, but I'm spooked that such a simple function, emitProgressEvent(), was the culprit. Is it the event emitter logic that was the problem?
OMG this is closed.
Same codebase running on desktop and on mobile (except for the third column on the table below)
Animated GIF of desktop logs
![desktop](https://user-images.githubusercontent.com/90512/108351801-aa678c00-71ee-11eb-9c0c-e3bec97b1501.gif)Animated GIF of mobile logs
![mobile](https://user-images.githubusercontent.com/90512/108351815-ad627c80-71ee-11eb-8177-7b1479d72360.gif)I'm running out of options on how to approach this. I think a reasonable choice is to do "mobile with maxCpu=98", make a release of Manyverse with db1 + db2 and give users a week to let migration complete. Then, a subsequent version uses only db2, and deletes the old log when migration is done.
Thoughts @arj03 ?