Closed AyaAkl25 closed 2 years ago
Hi we also bumped into that issue, we are creating around 2600 builds per day. Until now, we have used these approaches:
main
table like select * from build_steps where name = 'ProjectX'
and that will be really slow. But if you query only the ones of a given day, it is really fast. For instance: select * from build_steps_20211203 where name = 'ProjectX'
You can see an example of that approach in this query.With this approach you can only query one day at a time. So if you need to query the ones for a week month, you need to run 7 queries.
You can create indexes on a field of a table to speed up queries where that field is used. Here is an example of an Index we created to be able to speed up queries to the build_steps
table using the target_identifier
field
Creating an index on a large table takes several minutes, and may block insertions while the index is being created so run a query like that when there are no new builds coming to the database
Recently, we added an endpoint that returns the number of builds and the number of builds with errors every day. Running that query is expensive. So we created a recurring job that runs daily and that runs a Select SUM
query with the data of the previous day and the result is inserted into a new table.
In that way we only run a semi-expensive query daily and the result is a table with all the days computed. A simple query to that table is really fast.
We did that Job in Swift, but you can notice that there is nothing special about it. At our company we have other recurring jobs made in Python (using our Luigi tool) that do the same thing: run some queries on the data of a single day and insert the result on another table.
Thank you so much for your answer.
Will close this issue since there's a great set of advices given in this thread already on how to handle big data sets.
Hey guys, so I have been working now with XCMetrics for quite so long and it shows lots of great data that we need, the problem now that we are mapping the data into grafana board to monitor our build times. The problem now is that the postgres database whenever more data is there, the slowest the query to map data is.
especially build_steps and build_targets tables, they are super slow even opening the table and sorting by day asc
I am not sure if there is some problem to that? other than for example deleting old data?
Thank you