Spike: getting flux and UI teams on same page, regarding APIs to manipulate & validate flux.

drdelambre commented 2 years ago

A vast majority of our product and engineering strategies for the past year and near future have been built leaning into the fact that the flux team is publishing a source of truth about what valid flux means through the wasm. We use this to validate queries, to generate queries, and as a core mechanism for updating queries throughout the product. It was recently brought to our attention that there is syntax validation that is being missed when using the wasm while exploring this PR: https://github.com/influxdata/ui/pull/3916.. that the query was being generated, parsed, and reformatted through the wasm, and still failing when it hit the api. this means that there is some drift between how queryd is validating queries and how the wasm is. The learning of these differences will inform what product affordances we're allowed to offer in the future as we continue to rely more on query manipulation.

wiedld commented 2 years ago

💯 what Boatie said.

Concrete example:

we use the LSP to confirm the validity of a flux query --> is okay.
we send this flux query to the CRUD endpoint for tasks --> create 201.
we send this flux query to the run query API (/api/v2/query) --> get an error.

The specific use case in this PR #3916 is with a "badURL" text provided to the flux http.post(). But we want to extend this principle beyond this specific example.

.

It should be that whatever validation (e.g. semantic analysis etc) occurs should be uniform. Specific needs:

Access to "all validity checks". We are defining this in reference to the errors provided by the endpoint /api/v2/query. Specifically, all the checks occurring up to, but not including, the actual running of the query.
Access to these validity checks should be provided in a package consumable by our frontend services (a.k.a. wasm).
If this is outside the boundaries of what fits under the definition of an LSP package....then that is another discussion. But the main point is the need of the frontend services to confirm a valid flux query, without having to spam the /api/v2/query endpoint.

wiedld commented 2 years ago

Another concrete example:

If we don't select a field:
the output flux includes the expression trigger = (r) => r["undefined"] > 0
~this expression passes the LSP~ see next comment in this ticket.

this expression fails the query endpoint /api/v2/query. Error message

  runtime error @35:8-35:74: check: failed to evaluate map function: unsupported binary expression invalid > int

wiedld commented 2 years ago

Okay. The 2nd use case shown above, is something we need to fix in our own UI code. Turns out we are only running the lsp-flux check over part of the flux generated, not all of it.

2nd use case:

The trigger = (r) => r["undefined"] > 0 is because we are not using the LSP to check the conditions. Instead, we are manually creating them ourselves.
- here we create it: https://github.com/influxdata/ui/blob/0f3007294a24694caa195e83101aeca1391a2053/src/flows/pipes/Notification/ExportTask.tsx#L189-L191
- here we use it: https://github.com/influxdata/ui/blob/0f3007294a24694caa195e83101aeca1391a2053/src/flows/pipes/Notification/ExportTask.tsx#L215
We only use the lsp-flux validation on the task part of the subquery:
- https://github.com/influxdata/ui/blob/0f3007294a24694caa195e83101aeca1391a2053/src/flows/pipes/Notification/ExportTask.tsx#L214

.

The first use case is still a problem. We have a generated task query which is run through lsp-flux, returns fine, but then still errors on the endpoint.

Here it is run through flux:
- https://github.com/influxdata/ui/blob/0f3007294a24694caa195e83101aeca1391a2053/src/flows/pipes/Notification/ExportTask.tsx#L155
- https://github.com/influxdata/ui/blob/0f3007294a24694caa195e83101aeca1391a2053/src/flows/pipes/Notification/ExportTask.tsx#L214
the returned task includes the endpoint: http.endpoint(url: "badUrl")
fails at runtime (a.k.a. when hitting the /api/v2/query endpoint)

lsp-flux already does some semantic analysis, but we are only using the parse() method...which does not include the semantic analysis.

wiedld commented 2 years ago

Um...I think we can fix this ourselves (with permissions from the flux team ofc).

Here is where the lsp-flux rust internals get exposed through wasm:
- https://github.com/influxdata/flux-lsp/blob/master/src/wasm.rs
- It only exposes the parse() function.
Here is the rust method which performs parse and semantic analysis:
- https://github.com/influxdata/flux-lsp/blob/99e20e54d7ecb9e941bd52b76f0c14454c3da967/src/server.rs#L25
- you can see it used in the LSP server here: https://github.com/influxdata/flux-lsp/blob/99e20e54d7ecb9e941bd52b76f0c14454c3da967/src/server.rs#L245-L251
- So...I'm willing to attempt my first rust PR here, exposing this semantic analysis. After we sanity check with the flux team.

wiedld commented 2 years ago

@drdelambre -- in summary, I think these are the epic tickets:

Expose the semantic analyzer in lsp-flux.
Decide where to use semantic analysis (versus just parsing) in UI. Create tickets for each area.
- note: this is not only a direct swap from parse --> parse&analyze. This is potentially extending the amount of flux code which we validate.
- For the example where we parse() the task subquery, and not the rest of the query...we can now validate the entire query using parse&analyze().
- Note that using parse&analyze may be a preferred validation approach, over pinging the /api/v2/query endpoint. Since we won't be spamming our server, nor potentially writing data (changing DB state like the task would) in order to validate flux.
Implement.
- also need to do some QA testing, wherever we do the swap from parse --> parse&analyze
- maybe surface the meaningful errors returned from the semantic analysis?
Make sure the predicate conditions (see use case 2 from above comments) are being validated.
- Either as part of parse&analyze effort
- Or we do a manual check in js.

If you agree, then I can write-up the tickets.

drdelambre commented 2 years ago

i think you're mixing up some of data flows from that package, but all of these comments later, it seems like the original issue still remains; that the library is telling us that we are generating valid flux by not throwing an error during parse and format_from_js_file calls, but something is throwing a syntax error at the api when we send the query. the next step would be to talk to @rockstar about it, as he would have the most context on the mental model and caveats of that package vs what is running in the api, and make sure we're understanding the problem correctly and know what options are available to us before trying to deliver a solution.

wiedld commented 2 years ago

spoke with @rockstar . here is the summary:

the query endpoint has validation that cannot be exposed outside of a backend call. specifically it has:
- parsing
- semantic analysis
- storage-layer dependent checking of the recordType map, done as part of the query planning. This includes the type checking.
the storage-layer analysis is not possible to expose in a client lib, without hitting the backend.
however, we can expose some of the semantic analysis to the frontend client. but the current one in the lsp-flux lib is not stable.
@rockstar will be extracting the parse, stringification (format_from_js_file), and semantic validation features (specifically, anything not-dependent on the storage layer) into another client lib.
- scheduled for the next few sprints
- then we can consume this new client lib, specifically wherever we are generating flux outside of the monaco editor

rockstar commented 2 years ago

Here's my 2.019392622 cents:

It was recently brought to our attention that there is syntax validation that is being missed when using the wasm while exploring this PR: https://github.com/influxdata/ui/pull/3916.. that the query was being generated, parsed, and reformatted through the wasm, and still failing when it hit the api.

I looked at that PR, and the subsequent comments here about what the issue with that PR is, but it's still not entirely clear: what specific errors were you encountering? If there are legitimate syntax errors that are occurring, yes, that's a bug and we should fix actively fix that. There is a class of error that we will never be able to catch without executing the query against queryd/storage, which amounts to a runtime error: the data isn't shaped the way it's expected to be handled by the flux query. Could I request some specific errors?

It sounds like a desire from the team is to be able to do the semantic analysis pass on generated flux. We don't expose the semantic graph because it's not a public API and is constantly changing. Consuming that in a way that could be transformed/manipulated would a huge headache for y'all, as you'd have to keep up with flux internals. The expectation is that the AST is sufficient for this. If that's not the case, that's a bug we should address. We can, however, expose a way to perform semantic checks on some flux, returning a bool or an array of errors or something.

We're actively working on creating a library specifically for some of this work, which should make it easier to generate, consume, and transform flux with javascript. This should reduce some of the code burden on the UI team, but also can be used as a jumping off point for how we can better cater our APIs to help the UI team. I've asked @wiedld to take a look at the PRs as I push them in the next sprint or two, with the expectation they can start being consumed by the UI team in a few sprints.

wiedld commented 2 years ago

Our use cases, and one possible/suggested solution.

Use Case 1:

validation of a flux string, syntax and semantics (as much as is available outside of the storage layer).
returns a success, or useful error (which we can surface) from wasm.

Use Case 2:

we get the syntactically parsed AST.
making our own changes to the AST.
perform the semantic validation (outside the storage layer). Either on the AST, or when we convert the AST back to stringified flux.
returns a success, or useful error (which we can surface) from wasm.
- note that this success/fail validate, is in addition to getting back the re-stringified flux
get back the stringified flux

.

my 2.00000001 cents, is to suggest that we perhaps have three methods exposed (which in combination would cover all of our use cases):

parse string to syntax AST
semantically validate the AST (accepting the syntax AST, returning success/fail bool + the error)
stringify syntax AST to flux

We never would need the actual semantically evaluated/transformed AST or any other representations. So we don't care about any changes internally. We only need a validation method which consumes the syntax AST.

.

We can start trying to pick up specific classes of errors. But I think exposing a semantic check result (success/fail, and error message) is the best way to future proof ourselves. Basically, give us the ability to run any check we can (as much as is possible outside of the storage layer). And leave the semantic transformations/AST as something that does not need to be under contract.

From what I understand of the PRs that @rockstar is about to work on (the direction he is heading), that should align with our needs. What do you think @rockstar @drdelambre ?

wiedld commented 2 years ago

Also, another question just for my education.

The type checking requires the recordType information (storage layer information) and occurs during query planning. Does the function signatures (a.k.a. function schemas) exist outside of the storage layer? Meaning, that we can type check flux function signatures outside the storage layer?

.

Note that I am not making a feature request. Just trying to understand:

what our current semantic analysis is doing now (outside the storage layer)
and if it doesn't currently exist...can we hypothetically do function schema checking (a.k.a. type checking the function signature) in the future?

rockstar commented 2 years ago

Does the function signatures (a.k.a. function schemas) exist outside of the storage layer? Meaning, that we can type check flux function signatures outside the storage layer?

Yes, and I suspect that what we're really after here is a way to validate that we're invoking these functions, mostly from the stdlib, correctly.

wiedld commented 2 years ago

Agreed. We first noticed this problem with the http.endpoint(url: "badUrl") example. But we don't want to only solve for just this use case.

and in followup after convo: the "badUrl" example will only be caught at runtime, since the function signature is only a string (there is no url type). So the use of the semantic validation will not fix this use case. But will catch other errors...such as providing a float etc in the function signature.

wiedld commented 2 years ago

How this particular ticket fits into the larger story.

We are right now having to ping the query endpoint (which actually runs the query, potentially changing the DB state), in order to validate a given flux query. Some of this is unavoidable, because there are validations in the storage layer which we'll not get access to until runtime. But some of it CAN be caught in the UI level, if we have the proper methods exposed.

Current scope of work (agreed in convo with @rockstar -- please feel free to correct 😆 ):

keeping the LSP as an LSP. which is what it's supposed to be.
providing a new lib which does the:
1. parse to syntax AST
2. provides us a visitor to modify AST
3. provides a semantic evaluator to confirm there are no errors
4. take an AST --> stringify to flux
the asks 1,2, and 4 above (denoted 'i', 'ii', 'iv') are already existing in the LSP. But will be moved into the new client lib.
The new client lib will be an extracted, common single point, where all of this analysis will occur.
- commonly consumed lib
- which the UI will then use

wiedld commented 2 years ago

The proposed solution remains in progress. We can either close with spike and open a ticket for the proposed solution...or wait until the solution (a new wasm lib) is available for us.

wiedld commented 1 year ago

Wasm solution was already done. Closing.

influxdata / ui

Spike: getting flux and UI teams on same page, regarding APIs to manipulate & validate flux. #3926