Open dyoung-work opened 2 years ago
Are you able to export the schemas in each to see if there are any obvious differences?
EDIT: Removed exported schema files, as this turns out to have been a coincidence and I don't want people to waste time digging through those. Also updated the title.
It's starting to look like the actual culprit (as confirmed by wiping my test DB and starting fresh on 6.0.1) is the number of resources in the DB. I started to see the sawtooth CPU consumption again, and its intensity has been increasing from run to run for the last several days. The count in hfj_resources
going into the millions correlates very strongly with this behaviour.
Note, however, that this resource count is across multiple tenants, and each test is only interacting with a single new tenant. Is it possible that there's a missing index or missing filter when resources are added?
Describe the bug Hi maintainers! I'm having some throughput issues with HAPI after updating 5.7.0 -> 6.0.1 and was hoping someone might be able to shed some light on what's going on.
We're trying to do an ingest of 100,000 bundles, each of which only has 4-5 resources. With the configuration we've done to try to improve ingest throughput, it was taking ~40 mins. After updating to 6.0.1, however, it's taking closer to 70-90 mins. Now the interesting thing is that this performance hit only applies for a Postgres schema that was also used under 5.7.0 (and therefore had the migrations applied to it). I did a test with a fresh schema on 6.0.1 and the throughput was back to our norm. The two schemas are running on the same Postgres host, and the FHIR server isn't changed at all, except for the name of the schema it's pointing to. I should probably also note that these tests were against a single FHIR server (no clustering).
I'm attaching a couple of screenshots of the CPU consumption graph from Grafana during the ingest. The relatively level CPU consumption is from the fresh schema, and the sawtooth pattern is for the migrated schema. Blue = load generator, green = FHIR server. I should also note that there aren't any obvious errors in the FHIR logs during the ingest with the migrated schema, and we get the correct number of results on the other end.
Fresh schema:
Migrated schema:
In case it's relevant, here are the changes we've made to hit our current throughput numbers: (but feel free to share more! We'd love to learn a new tweak)
Is there a chance that there's some odd interaction happening for migrated schemas? I redid the migrations manually as a sanity-check, but there was no change to the behaviour mentioned above. I'm hoping there's a way to salvage the existing data and the better throughput.
To Reproduce
Expected behavior Using a migrated schema instead of a fresh one shouldn't impact ingest performance.
Environment (please complete the following information):
Additional Info I've done multiple runs, so I know it's not a bad run. The DB host, and the hosts everything else is running on are dedicated for my testing (and can easily handle the current workload).