Closed matiasgarciaisaia closed 3 months ago
We suspect the CSV encoder may be suboptimal.
We've also tested with 100k and 50k limits for the interactions files, but the performance was a slightly worse (startup took a bit longer, and the overall speed didn't improve further).
CC: @ggiraldez in case you want to add anything else.
CC: @ggiraldez in case you want to add anything else.
I know very little of Elixir or Ecto, but you may also want to explore streaming directly from the database. The MySQL driver supports it, although it requires a transaction which may be a deal breaker perhaps? Anyway, see https://hexdocs.pm/ecto/Ecto.Repo.html#c:stream/2
In line with @ggiraldez suggestion I realized that the incentives
file is built using Repo.Stream
instead of Stream.resource
.
I think it's worth the effort try changing the other three files to be built this way and see how the servers behave
I tried changing the queries to use Ecto's Repo.stream
(instead of manually doing Stream.resource
and paginating from the app) but preloads are not supported on streams
ðŸ«
There may still be room for doing the CSV streaming straight from the database (ie, make MySQL output CSV) as @ggiraldez suggested me, but I'm not sure if that'll work or not - I'll leave this as is, we might explore that optimization if we eventually need it. Given we're about to make the file generation async (in #2350) improving the times won't be that important, either.
Respondent files are usually large (Interactions files can grow up to 1M rows), and the "low" limit in queries made the DB work much more than needed (we've observed 99% CPU usage in the mysqld process when generating a 1M-rows interactions file with 1000 rows per query).
Increasing this limit makes the app generate less queries to the DB, effectively driving the CPU usage down to about 30% instead.
There's probably more room for improvement (the generation of the file is still CPU-bound instead of network-bound), but that's on the app itself - we should profile the app's code to further improve the performance.
See #2350 See #2359