nelc / futurex-openedx-extensions

Open edX APIs and reporting tools
Other
0 stars 0 forks source link

feat: Custom MySQL queries similar to Clickhosue queries #53

Closed OmarIthawi closed 1 month ago

OmarIthawi commented 1 month ago

Description

Provide support for raw MySQL queries in the /api/fx/query endpoint.

Goals

Technical Notes

I recommmend againsts creating another API since the current API design is fairly backend non-specific, which is great.

We probably want to rename the ClickhouseQuery --> CustomQuery.

Adding another field to the ClickhouseQuery model as the following:

class CustomQuery:
   backend = StringField(choices=['clickhouse', 'mysql']

Refactoring out the backend-specific code into MySQLQueryBackend and ClickhouseQueryBackend is probably needed.

OmarIthawi commented 1 month ago

Closing is it'll probably not address the speed of delivery much. Thanks @shadinaif for the feedback.

Thank you Omar, I understand your passion to do that for delivery sake. It's doable as you described in the Technical part. I can add a positive point:

  • Optimising queries for better performance when ORM is not helping

But now I'll give my view for the downsides. I'm not trying to push back; I just want to give the idea a fair analysis, so we all don't give it more value than it worth:

  • Experimenting the data from our side is way better using beaver
  • Experimenting the data from the customer side needs UI: not a cheap option
  • Queries are very complex, and not easy to maintain
  • Discrepancies between results of queries will start to be notable. In ORM, we have functions that return reusable QuerySet s to make all results consistent with each other. And still we faced bugs with that!
  • No possible way to automate tests for that
  • Custom reports per tenant is not a feasible option without a Query/Report Builder UI for the customers. This is a project by it's own

On the other hand, I don't understand the benefit of connecting ClickHouse to MySQL, it'll act like a proxy, what good can we get from that? I encourage building a data collector that copies some data from MySQL to ClickHouse in a regular bases (regular depending on the data nature)I beleive that we must consider the types of reports the dashboard can provide, rather than the types of data in the reports\ We have two type of reports:

  • Real-time Reports:

  • 5 to 10 minutes delay is still considered real-time for managers

  • Handy for monitoring progress of tasks, goals, and achievements

  • We already built most of those

  • Batch Processing Reports:

  • Daily, Weekly, Monthly, etc.. let's assume we have them all daily generated/updated

  • A lot more valuable than real-time reports

  • BI is concerned about this type of reports, collecting aggregated data outside the source database and make it ready for reporting. Performance is extremely higher because the data is literally ready to be rendered in reports

  • Data redundancy is not an issue. The important thing is to have the data correctly collected and ready on time

So, my opinion is that:

  • We can do the free-query setup as a temporary survival strike, not as a helper tool

  • We should be thinking about BI instead. MySQL is slow and expensive. We should get the valuable data we need outside MySQL

  • We must totally forget about custom queries per tenant, and heavily push back on that if requested. It should be a separate project or sub-project: Custom Reports Builder:

  • The builder provides virtual table names and fields (the table structure of edx-platform is hideous for a non-edx professional

  • The builder provides way to display the data: tabular, charts, etc..

  • Many other features

  • --- this can be a product for edx-platform ---

  • An alternative option of Custom Reports Builder is a Data Collector for ClickHouse per Tenant. And work on providing tenant admins an access on specific tables to build their own queries

OmarIthawi commented 1 month ago

🎉 🎉 🎉 🎉 🎉 🎉

Bless you @shadinaif :) This is huge!!