grafana / sqlds

A package that assists writing SQL-driven datasources
Apache License 2.0
17 stars 12 forks source link

Long Running Queries #69

Closed kevinwcyu closed 1 year ago

CLAassistant commented 2 years ago

CLA assistant check
All committers have signed the CLA.

CLAassistant commented 2 years ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 3 committers have signed the CLA.

:white_check_mark: kevinwcyu
:x: iwysiu
:x: andresmgot
You have signed the CLA already but the status is still pending? Let us recheck it.

sunker commented 2 years ago

I'd like to provide some high level feedback here on the overall architecture before reviewing the code. So this comment is not only for changes made in this PR, but also the draft PR the grafana-aws-sdk, redshift-datasource and in grafana-aws-sdk-react. Hope that's ok!

First of all, I've tested the feature locally and it seems to work fluently. It solves the problem it aimed to solve, and the user experience is great. So really amazing job here!

Regarding the design, there's a few things that I'm a bit concerned about:

An alternative design would be to create a new redshiftAsyncDriver that is implementing the go database/sql/driver interface. This would handle all async logic. When a new query is executed with QueryContext, the async driver would 1) getQueryId 2) if no existing query start new query, return one column (status) with one row ('ongoing') 2) if existing query get query status if completed, return the result. If ongoing, return one column (status) with one row ('ongoing')

It would be up to the request looper in the frontend to check the status column and decide whether to make a new backend request.

It's a bit unconventional, and it would be one extra API call to check query id per backend request, but...

Thoughts @kevinwcyu @iwysiu?

kevinwcyu commented 2 years ago

Thanks, @sunker

  1. getQueryId
  2. if no existing query start new query, return one column (status) with one row ('ongoing')
  3. if existing query
    get query status
    if completed, return the result. If ongoing, return one column (status) with one row ('ongoing')

It would be up to the request looper in the frontend to check the status column and decide whether to make a new backend request.

Would it not be possible to set the meta field instead of returning the metadata as data? If we go this route, I'd expect us to have to choose a very unique name for the column to prevent clashing with a genuine user query.

the generic sqlds code could remain unchanged, using only one execution flow

Is there a place we could share the asynchronous flow handling? Perhaps a different package within sqlds? If I'm understanding the above correctly, each datasource would be reimplementing the flow itself, it would be nice if a datasource could opt-in to using some shared handling for async flows.

I'm still digesting some of the other suggestions, will wait for @iwysiu's thoughts as well.

sunker commented 2 years ago

Would it not be possible to set the meta field instead of returning the metadata as data? If we go this route, I'd expect us to have to choose a very unique name for the column to prevent clashing with a genuine user query.

The notion of data frames doesn't exist in go database/sql, so only rows can be returned. Meta fields could be set in sqlds, but if possible I'd like to avoid these kind of changes there.

Is there a place we could share the asynchronous flow handling? Perhaps a different package within sqlds? If I'm understanding the above correctly, each datasource would be reimplementing the flow itself, it would be nice if a datasource could opt-in to using some shared handling for async flows.

Since sqlds is built around go database/sql, and database/sql is not async by nature, we should avoid adding such code to this package. Also, the async execution flow will only be of interest for redshift and athena, not any other consumer of this package. For that reason, I think I'd like to see is the async vs sync flow routing taking placing somewhere on a higher level than sqlds.

Another approach would be to create a wrapper around sqlds that would provide this routing. This wrapper could for example be placed in grafana-aws-sdk. Some basic pseudo code here:

// creating a new ds in main.go in redshift
s := redshift.New()
ds := awsds.HigherLevelDatasource(s)
ds.Completable = s
ds.CustomRoutes = routes.New(s).Routes()
...

// in grafana-aws-sdk
type HigherLevelDatasource struct {
    *sqlds.SQLdatasource
    sqldsQueryDataHander backend.QueryDataHandlerFunc
}

func NewHigherLevelDatasource(driver sqlds.Driver) *HigherLevelDatasource {
    sqlds := sqlds.NewDatasource(driver)
    return &HigherLevelDatasource{
        SQLdatasource:        sqlds,
        sqldsQueryDataHander: sqlds.QueryData,
    }
}

func (ds *HigherLevelDatasource) QueryData(ctx context.Context, req *backend.QueryDataRequest) (*backend.QueryDataResponse, error) {
    if syncExectionEnabled {
        return ds.sqldsQueryDataHander.QueryData(ctx, req)
    } else {
        // async flow
    }
}

Code not tested!

My main point here is that I'd like to keep sqlds untouched if possible.

andresmgot commented 2 years ago

My two cents here, I agree with the point of making a clear differentiation between the existing sync code and the new async. Creating a wrapper around the current package makes sense, since in theory, the async flow is just an extension of the sync one since datasources should support both.

grafana-aws-sdk is maybe not the best place for it though. We probably want to use this flow for non AWS data sources (like bigquery) and placing the code there may make some assumptions about AWS. Is making a sub go package here isolation enough? IMO it is but if not we may need to create a new repo for this (which I would avoid if there is not major reason for it).

sunker commented 2 years ago

grafana-aws-sdk is maybe not the best place for it though. We probably want to use this flow for non AWS data sources (like bigquery) and placing the code there may make some assumptions about AWS. Is making a sub go package here isolation enough? IMO it is but if not we may need to create a new repo for this (which I would avoid if there is not major reason for it).

I agree - end goal here would be to have it in its own package, or maybe even in the grafana-plugin-sdk-go. But to get a quick start, you can use the grafana-aws-sdk and move it elsewhere when the api is complete.

iwysiu commented 1 year ago

closing in favor of https://github.com/grafana/sqlds/pull/70 , which doesn't have the async code in sqlds.