Allow report results to be retrieved with webservice calls

abias commented 3 years ago

Hi Tim,

at Ulm university, we are looking for a solution to fetch arbitrary data from the Moodle database which can't be retrieved easily (or performant) with existing Moodle core webservices.

Our particular current need is to retrieve course completion lists of whole, big courses without having to iterate over all students and calling core_completion_get_course_completion_status for each student. However, this issue is about general data retrieval needs as described above.

We are already using report_customsql and hold reports for the required data there. We are just missing the part that we can retrieve these report results with webservice calls instead of having to download a CSV from report_customsql and instead of having to process scheduled email reports from report_customsql.

To realize our goal, we were in touch with Moodle partner Catalyst IT and have learnt that a developer of theirs has built a plugin on https://github.com/safatshahin/moodle-local_configurable_api which extends block_configurable_reports to support webservices and which basically looks like what we would want to see in report_customsql as well.

That's why we are thinking about contracting Catalyst IT to write a similar feature for report_customsql and to contribute a pull request for report_customsql so that the functionality ends up upstream in your plugin.

For the report_customsql side, the rough specification would be:

Add a new webservice report_customsql_get_query_result
This webservice will have a parameter which expects the query ID as integer value
This webservice will also have a parameter which expects an array with key->value pairs for each parameter which is defined in the SQL query
When being called, this webservice first checks if the webservice user has the capability which is set in the query's 'Who can access this query' setting
When being called, this webservice processes the given query (with the given parameters), composes the resulting data table and returns it in the format which is defined by the chosen webservice protocol (REST, XML-RPC,...)

@timhunt , I would be grateful if you could have a look at this rough specification, highlight anything which would be important for you as maintainer and then give us your preliminary feedback if you would be willing to integrate this feature if we would contract Catalyst IT and provide a well-crafted pull request.

Thanks, Alex

timhunt commented 3 years ago

Last bullet point does not account for scheduled queries. Are you planning to only support on-demand queries initially?

How are you planning to return tabular data in the various formats?

abias commented 3 years ago

Thank you, Tim, for your quick reply.

Last bullet point does not account for scheduled queries. Are you planning to only support on-demand queries initially?

I am unsure if I get your question correctly, but aren't webservice calls always on-demand? So, the planned implementation should surely support on-demand and scheduled queries, but for scheduled queries, the caller won't be able to pass query parameters.

Please let me know if I misunderstood your question.

How are you planning to return tabular data in the various formats?

My personal preference would have been to return the tabular data row per row, similiar to how existing webservices like core_completion_get_activities_completion_status which return multiple datasets behave.

We just would have to discuss of the webservice should return the table headers (= the signature of the query) as well or not.

However, regarding these technical details, I will ask a Catalyst IT developer to add some more thoughts.

brendanheywood commented 3 years ago

Just something to throw into the brew, we recently did something very similar to this with another client but it was based on Totara report builder where the client wanted arbitrary control over the report combined with an api export. The big challenge we had is that we wanted to be able to stream results all the way from the database -> recordset -> api -> client and on demand. As far I can tell the moodle webservice api is just poorly suited to streaming data as you have to format the chunk of data in memory and then all the buffering and serialization and output is handled by core and it will blow out memory on anything of an interesting size.

This isn't a show stopper its just some scar tissue worth sharing and avoid a repeat of. I don't have a fantastic solution, but the cleanest solution I can think of is a custom endpoint which accepts that web service token and uses the DataFormat Api so it can stream any query in any format. eg

https://github.com/mdjnelson/moodle-mod_customcert/blob/MOODLE_311_STABLE/mobile/pluginfile.php#L44-L49

Or instead skip the WS tokens completely and instead use get_user_key() / require_user_key_login() directly the same way rss and calendars do.

timhunt commented 3 years ago

@abias - if yout think of the purpose of the API as 'fetching the results of a query', then it makes sense to be able to fetch the results of past runs of scheduled queries.

abias commented 3 years ago

@timhunt - I see. Thank you for this explanation. Indeed, we would mainly think of this contribution as a feature to fetch the results of a query in realtime and the results should be based on the current database values.

I understand that this plugin also supports scheduled queries. However, supporting scenarios like 'Dear webservice, please give me the results of the last run of the scheduled query x' would be out of scope for us.

brendanheywood commented 3 years ago

@timhunt can you comment on my proposal to use the DataFormat api directly to export any query in any format? Due to the deficiencies in the webservices api this can't be a webservice but in my mind is the correct way to do it with everything streamed and it removes a large amount of complexity from the implementation.

I'd second Alex's comment, the kinds of integration projects we've done which have been built on top of similar things (eg in totara report builder's api) these things need to be live data. I'm not opposed to the implementation also working for reports which happen to be scheduled and pre-saved, but it must work for the on demand ones as the primary use case.

timhunt commented 3 years ago

I think we should start by building a completely standard Moodle web service API.

Let's not invent unnecessary complications until we are sure we need them.

I never said "don't haven a web service for on-demand queries". I said "There is this whole feature of customsql which you have completely failed to consider in your proposal. Is it really too much to expect that you write a clear, complete proposal?

abias commented 3 years ago

Hi Tim,

I am sorry for my late reply.

I think we should start by building a completely standard Moodle web service API. Let's not invent unnecessary complications until we are sure we need them.

All right. This is fine for us. I think @brendanheywood made a valid point, but this shouldn't stop us from going the standard path first.

I never said "don't haven a web service for on-demand queries". I said "There is this whole feature of customsql which you have completely failed to consider in your proposal. Is it really too much to expect that you write a clear, complete proposal?

Well, I wanted to generally discuss the chance to see such a contribution integrated into the plugin before writing a detailled specification. As this is discussed now, let me give you our refined proposal:

A new webservice report_customsql_get_query_result is added to the plugin. This webservice will be implemented as standard Moodle webservice This webservice will be supporting all enabled webservice protocols
The purpose / description of the webservice will be: 'The webservice returns the results of pre-configured SQL queries'
When being called, the webservice will have a mandatory parameter queryid which expects the query ID as integer value the webservice will also have an optional parameter parameters which expects an array with key->value pairs for each parameter which is defined in the SQL query
If the query contains the %%USERID%% parameter (which would be filled with the logged-in Moodle user ID in the GUI), the webservice should accept an additional optional parameter 'userid' which expects a Moodle user ID und which will fill this parameter and should return an error message if the userid parameter is not given. contains a :parameter_name starting or ending with the characters date (which would trigger the output of a date selector in the GUI), the webservice should accept all absolute date formats which are handled by PHP itself for this parameter and should return an error message of no proper data format is given.
The report_customsql plugin supports ad-hoc and scheduled queries. The difference of both query types is that ad-hoc queries return live data directly from the database while scheduled queries are run at a pre-defined time and results will either be stored on disk or emailed out. In contrast to this, the webservice to be implemented will always return live data. When the webservice is being called with the queryid parameter set to a scheduled query, the query will be executed ad-hoc and live data will be returned as well. Though, if the scheduled query contains one of the %%STARTTIME%% or %%ENDTIME%% placeholders, returning live data is not possible. If the webservice is called to retriebe the result of such queries, an error message will be returned. Supporting scheduled queries in their original purpose and full scope, i.e. to return previously stored data instead of live data, is out of scope for this proposal and can be implemented by someone else at a later stage.
The report_customsql plugin controls the retrievability of a particular query result with the capabilities moodle/site:config, moodle/site:viewreports and report/customsql:view. The webservice implementation will not introduce any new capabilities. When being called, the webservice checks if the webservice user has the capability which is set in the query's 'Who can access this query' setting. If the webservice user has the capability, the result is returned. If the webservice user does not have the capability, an error message is returned.
When being called, this webservice processes the given query (with the given parameters) in the database, composes the resulting data table and returns it in the format which is defined by the chosen webservice protocol (REST, XML-RPC,...). Further details of the returned data table format will be given by a Catalyst IT developer soon.
In all cases where the requested query can't be processed correctly, a proper error message should be returned.
In all cases where the requested query will return an empty result, a proper but empty data table should be returned.

Tim, I hope this makes things clearer and would appreciate any feedback.

Cheers, Alex

brendanheywood commented 3 years ago

I think the format of the data export should be completely independent from the format of the webservice protocol, and it would likely be extremely difficult or ugly to try and do it any other way.

Ideally I'd go with an optional web service param dataformat and accepts the string name of any format, csv, excel, html etc which is enabled. If you don't select one then it defaults to the top on in the list.

The big hoop to jump through is we must encode the text / binary data. I'm struggling to find an example in core of any existing web-service that directly returns a file. There is no PARAM_BINARY. Maybe I'm missing something?

Option 1) We could use PARAM_BASE64 which would work but bloat the file size and process memory even more. Viable but I'm not a fan.

Option 2) The web service doesn't return a file, it just returns an absolute url to a second url containing a user token which points to the existing download.php endpoint which then just serves the file directly using the dataformat api as it already does. This mean the webservice will always respond basically instantly as it's not doing any real work. The download.php file will only need some very minor changes to optionally accept that token and validates it instead of require_login().

The download page will also need so more minor tweaks to accept params passed in as query params through from the webservice. This should be just a refactor of the same logic already in view.php

I think option 2 is superior and also easier to do as its reusing existing code. It also side steps all the discussions around adhoc vs scheduled and everything else, because it is the same code and behavior as now and the only thing which is changed is how the download process is authenticated.

contains the %%USERID%% parameter

Shouldn't this instead use the userid which is linked current user via the web service token? Otherwise anyone who has this capability will be able to run reports against other users which they would not be able to do in the UI.

abias commented 3 years ago

Hi Brendan,

thanks for adding your thoughts again.

I think the format of the data export should be completely independent from the format of the webservice protocol, and it would likely be extremely difficult or ugly to try and do it any other way.

Ideally I'd go with an optional web service param dataformat and accepts the string name of any format, csv, excel, html etc which is enabled. If you don't select one then it defaults to the top on in the list.

I think you are right and I was too shortsighted.

My basic understanding was that each of Moodle's webservice protocols have their own return format, i.e. a REST webservice function uses another format than XMLRPC and so on.

However, this webservice-defined format is just the envelope of the data. If we are talking about the datatable format itself, it would be nice if the caller could request the data in the format of his choice instead of, for example, having to parse huge XML structures just because he uses the REST webservice protocol.

So, from my point of view, it would be appreciated to have this additional dataformat parameter.

The big hoop to jump through is we must encode the text / binary data. I'm struggling to find an example in core of any existing web-service that directly returns a file. There is no PARAM_BINARY. Maybe I'm missing something?

I am somehow struggling to find the context for this. My initial hope was that the webservice would directly return the tabular data within the webservice response so that the caller can directly process it. This would be especially appreciated for queries which will return comparably small datasets and I would like to propose to go this way first.

However, I agree that for large returned datasets, this approach might become a burden in terms of processing and waiting time. Having an optional parameter to request a download URL instead of the data directly would be fine. But this should be optional and it should be up to the caller to decide which way to go.

What do you think, Brendan?

Shouldn't this instead use the userid which is linked current user via the web service token? Otherwise anyone who has this capability will be able to run reports against other users which they would not be able to do in the UI.

Ah, yes, you are right, Brendan. My proposal triggers a privacy leak.

My reasons for proposing this parameter was that I assumed that the webservice user who is calling the webservice would be a user account for a machine, but not an individual human. Having a userid parameter would enable this machine to run existing queries based on individual users, just like these users would run the queries in the GUI.

However, after your comment, I think we should drop this idea and the %%USERID%% parameter should indeed be filled with the user ID of the webservice caller.

If something like the scenario which I imagined above should be realized somewhere, a focused query with the necessary query parameters can be built by the admin.

Cheers, Alex

brendanheywood commented 3 years ago

If we want the table payload directly inside the webservice envelope then we must encode it in some way so it's valid string and then can be inside xml / rest or whatever protocol. base64 encoding is the simplest and will work but then the client has to decode it.

If we know for sure that the data is just a normal string, ie its html or json or csv, but not ods or excel or any other binary then we could put return that directly from the normal webservice code but it makes it slightly awkward as it wouldn't work with all dataformat's. But what we can do is reach into the dataformat that is requested and see if the mimetype it returns is 'text/*' which is true for csv, html, and false for ods, excel and pdf and we'd throw a ws exception. json is technically text but it currently returns a mimetype of application/json so we could either ignore it or make a special exception for it.

My initial hope was that the webservice would directly return the tabular data within the webservice response so that the caller can directly process it.

I still think leveraging download.php will give you what you want and be easier than you think to consume. The webservice I'm proposing is deterministic, all it does is return a url, so in some client code if you grab that second url you don't really need the webservice any more, you can just curl the download page without ever touching the ws again. And it is much simpler because it's not a webservice so you don't need any wrapper code or sdk to deal with it, it would be a curl one line to directly pipe the output to a file, or to internally process it. And you also get the headers with the right mime type and file name which might be useful. And it will work with all the binary file dataformats. The webservice call is effectively acting similar to an oauth2 refresh token endpoint.

Here is a current working example of curling a download using a session cookie:

$ curl 'http://moodle.local/report/customsql/download.php?sesskey=fMoA9ZneXB&dataformat=csv&id=1&timestamp=1636538472'\n
 -H 'Cookie: MoodleSession=ndq459sc5e0q0b5ufrv2598bf0;'  
"CURRENT TIME",?column?,"generate series"
21:01:12.053984+11,test,1
21:01:12.053984+11,test,2
21:01:12.053984+11,test,3
21:01:12.053984+11,test,4
21:01:12.053984+11,test,5

What I'm proposing is instead of a session cookie is uses a user login token, which is the same way you access a private rss feed or calendar export. The example token is a get but would also accept a post.

$ curl 'http://moodle.local/report/customsql/download.php?usertoken=xyz123456&dataformat=csv&id=1'  
"CURRENT TIME",?column?,"generate series"
21:01:12.053984+11,test,1
21:01:12.053984+11,test,2
21:01:12.053984+11,test,3
21:01:12.053984+11,test,4
21:01:12.053984+11,test,5

This is way easier to consume in client code than a webservice call, it could just be this and you're done:

$csvdata = str_getcsv(file_get_contents($url));

From a security perspective user login tokens are equivalent to webservice tokens. We could also make it honor the webservice token directly too and then we don't even need to call the first webservice. The webservices wrapper envelope doesn't really add any value here and it just gets in the way because it fundamentally wasn't designed to return files, and especially not big ones.

timhunt commented 3 years ago

Seriously! we now about about 1000x as much text here as the final patch will be. (I will freely admin I have not read it all.)

I like the idea of allowing download.php to be used with tokens.

I'm just wondering if we should follow the pattern of pluginfile.php and tokenpluginfile.php in core. So have download.php and tokendownload.php for the two cases - with refactoring to minimise duplicate code.

jwalits commented 3 years ago

Hi all,

Whilst doing the initial implementation on the download and tokendownload pattern, we came across a small issue where the download.php file needs a dataformat parameter (even though in code its optional, an error is thrown if not passed). We have made the webservice accept a dataformat parameter as in input. It will default to csv for now (open to suggestion if it should be moved settings or leave in code).

brendanheywood commented 3 years ago

I've created a clean new issue with clearer direction here https://github.com/moodleou/moodle-report_customsql/issues/105

abias commented 3 years ago

Thank you, @brendanheywood ! Closing this issue in favour of #105.

moodleou / moodle-report_customsql

Allow report results to be retrieved with webservice calls #99