Open fdocr opened 1 year ago
I find the debug logger gives a considerable performance hit, first try turning that down to a warning level or error level and re-run.
The biggest performance hits in cases like this are usually when generating and serving data to a console. The output is handy, but comes at a performance cost similar to what you're seeing.
I just did a test with Jennifer and Kemal where I made 2,000,000 db entries, exported to a file and saved to disk and that took under 36 seconds on my M1 Pro.
Confirmed the log output wasn't causing the issue. Some more of my tracing also confirms that when separated (Turn.create
vs .build
and then .save
) it's the .save
and not .build
that's taking up a long time.
I realized this is a connection pool size issue. I found this comment on Jennifer::Adapter::Base
that could be what I'm experiencing. The tests I'm running are quick requests coming in by a CLI tool and there's commonly 2 in parallel at a time. Adding conf.pool_size = (ENV["DB_POOL"] ||= "5").to_i
to the database configuration solved my problem.
Since I also had a granite implementation here's the performance comparison between them on a similar test for both of them.
Jennifer
Granite
Turns out Jennifer performs better than Granite 😄
I wonder if it's worth changing the default connection pool size from 1 to something like ActiveRecord's default of 5 (I believe that's right but might be something different). Regardless of this, I guess it's important to always configure it via ENV var for web server configurability in prod environments anyways.
I'll go ahead and make some updates to the docs to clarify the connection pool. I've also run into some challenges with the config initializer and ensuring it's using the test db when running crystal spec
so I think I'll add in the niceties I've come up with for that
@fdocr I made some changes to the docs to help others avoid the performance bottleneck you experienced. Check out #430
That's awesome. Thank you @crimson-knight
Ok, it sounds that technically you don't have performance issue anymore but the docs should be updated to make it clear you need to configure connection pool size (@crimson-knight thank you for this). Initially connection pool size was changed to 1 to make it easier to debug potential people issues and make it more reliable (previously I had some issues with concurrent requests). Maybe the time has come and it worth increasing it to something bigger (like 5). What do you think @crimson-knight @cyangle @fdocr
Let's change the default pool size to 5, that makes sense to me.
Let's also add a note that if they need to debug potential concurrent request problems to adjust the pool size.
I've been running some tests where I insert new records of a
Turn
model on an endpoint and I'm seeing degraded performance. I'm opening this issue where I would hope maybe there's a way to work around this, or perhaps an opportunity to identify and fix this problem for the shard. I implemented the same DB logic withamberframework/granite
to compare and the performance is much better there. I preferjennifer
overall for a couple of reasons, so it would be great to find a solution for this issue.I added Opentelemetry tracing for the slow endpoint and looks like this:
The records are being persisted correctly and I'm seeing this DEBUG log from the database
This tells me the DB roundtrip is quick (within
< 1ms
) but response times are coming back in around1,000 ms
. With tracing I can see these results:A simple endpoint with a straightforward query like
Turn.all.to_a
is much quicker (~6ms
responses with~2.5 ms
DB log times). So based on all of this my assumption is that there is some overhead in the model that's being created. I'd love to help further identify this, but I'd need help understanding how can I go about this (open for pairing or running more/different tests if needed).