Closed OmarIthawi closed 1 month ago
Closing is it'll probably not address the speed of delivery much. Thanks @shadinaif for the feedback.
Thank you Omar, I understand your passion to do that for delivery sake. It's doable as you described in the Technical part. I can add a positive point:
- Optimising queries for better performance when ORM is not helping
But now I'll give my view for the downsides. I'm not trying to push back; I just want to give the idea a fair analysis, so we all don't give it more value than it worth:
- Experimenting the data from our side is way better using beaver
- Experimenting the data from the customer side needs UI: not a cheap option
- Queries are very complex, and not easy to maintain
- Discrepancies between results of queries will start to be notable. In ORM, we have functions that return reusable
QuerySet
s to make all results consistent with each other. And still we faced bugs with that!- No possible way to automate tests for that
- Custom reports per tenant is not a feasible option without a Query/Report Builder UI for the customers. This is a project by it's own
On the other hand, I don't understand the benefit of connecting ClickHouse to MySQL, it'll act like a proxy, what good can we get from that? I encourage building a data collector that copies some data from MySQL to ClickHouse in a regular bases (regular depending on the data nature)I beleive that we must consider the types of reports the dashboard can provide, rather than the types of data in the reports\ We have two type of reports:
Real-time Reports:
5 to 10 minutes delay is still considered real-time for managers
Handy for monitoring progress of tasks, goals, and achievements
We already built most of those
Batch Processing Reports:
Daily, Weekly, Monthly, etc.. let's assume we have them all daily generated/updated
A lot more valuable than real-time reports
BI is concerned about this type of reports, collecting aggregated data outside the source database and make it ready for reporting. Performance is extremely higher because the data is literally ready to be rendered in reports
Data redundancy is not an issue. The important thing is to have the data correctly collected and ready on time
So, my opinion is that:
We can do the free-query setup as a temporary survival strike, not as a helper tool
We should be thinking about BI instead. MySQL is slow and expensive. We should get the valuable data we need outside MySQL
We must totally forget about custom queries per tenant, and heavily push back on that if requested. It should be a separate project or sub-project: Custom Reports Builder:
The builder provides virtual table names and fields (the table structure of edx-platform is hideous for a non-edx professional
The builder provides way to display the data: tabular, charts, etc..
Many other features
--- this can be a product for edx-platform ---
An alternative option of Custom Reports Builder is a Data Collector for ClickHouse per Tenant. And work on providing tenant admins an access on specific tables to build their own queries
🎉 🎉 🎉 🎉 🎉 🎉
Bless you @shadinaif :) This is huge!!
Description
Provide support for raw MySQL queries in the
/api/fx/query
endpoint.Goals
Technical Notes
I recommmend againsts creating another API since the current API design is fairly backend non-specific, which is great.
We probably want to rename the
ClickhouseQuery
-->CustomQuery
.Adding another field to the
ClickhouseQuery
model as the following:Refactoring out the backend-specific code into MySQLQueryBackend and ClickhouseQueryBackend is probably needed.