Closed drdelambre closed 1 year ago
💯 what Boatie said.
Concrete example:
/api/v2/query
) --> get an error.The specific use case in this PR #3916 is with a "badURL"
text provided to the flux http.post(). But we want to extend this principle beyond this specific example.
.
It should be that whatever validation (e.g. semantic analysis etc) occurs should be uniform. Specific needs:
/api/v2/query
. Specifically, all the checks occurring up to, but not including, the actual running of the query./api/v2/query
endpoint.Another concrete example:
If we don't select a field:
the output flux includes the expression trigger = (r) => r["undefined"] > 0
~this expression passes the LSP~ see next comment in this ticket.
this expression fails the query endpoint /api/v2/query
. Error message
runtime error @35:8-35:74: check: failed to evaluate map function: unsupported binary expression invalid > int
Okay. The 2nd use case shown above, is something we need to fix in our own UI code. Turns out we are only running the lsp-flux check over part of the flux generated, not all of it.
2nd use case:
trigger = (r) => r["undefined"] > 0
is because we are not using the LSP to check the conditions. Instead, we are manually creating them ourselves.
.
The first use case is still a problem. We have a generated task query which is run through lsp-flux, returns fine, but then still errors on the endpoint.
endpoint: http.endpoint(url: "badUrl")
/api/v2/query
endpoint)lsp-flux already does some semantic analysis, but we are only using the parse() method...which does not include the semantic analysis.
Um...I think we can fix this ourselves (with permissions from the flux team ofc).
@drdelambre -- in summary, I think these are the epic tickets:
/api/v2/query
endpoint. Since we won't be spamming our server, nor potentially writing data (changing DB state like the task would) in order to validate flux.If you agree, then I can write-up the tickets.
i think you're mixing up some of data flows from that package, but all of these comments later, it seems like the original issue still remains; that the library is telling us that we are generating valid flux by not throwing an error during parse
and format_from_js_file
calls, but something is throwing a syntax error at the api when we send the query. the next step would be to talk to @rockstar about it, as he would have the most context on the mental model and caveats of that package vs what is running in the api, and make sure we're understanding the problem correctly and know what options are available to us before trying to deliver a solution.
spoke with @rockstar . here is the summary:
Here's my 2.019392622 cents:
It was recently brought to our attention that there is syntax validation that is being missed when using the wasm while exploring this PR: https://github.com/influxdata/ui/pull/3916.. that the query was being generated, parsed, and reformatted through the wasm, and still failing when it hit the api.
I looked at that PR, and the subsequent comments here about what the issue with that PR is, but it's still not entirely clear: what specific errors were you encountering? If there are legitimate syntax errors that are occurring, yes, that's a bug and we should fix actively fix that. There is a class of error that we will never be able to catch without executing the query against queryd/storage, which amounts to a runtime error: the data isn't shaped the way it's expected to be handled by the flux query. Could I request some specific errors?
It sounds like a desire from the team is to be able to do the semantic analysis pass on generated flux. We don't expose the semantic graph because it's not a public API and is constantly changing. Consuming that in a way that could be transformed/manipulated would a huge headache for y'all, as you'd have to keep up with flux internals. The expectation is that the AST is sufficient for this. If that's not the case, that's a bug we should address. We can, however, expose a way to perform semantic checks on some flux, returning a bool or an array of errors or something.
We're actively working on creating a library specifically for some of this work, which should make it easier to generate, consume, and transform flux with javascript. This should reduce some of the code burden on the UI team, but also can be used as a jumping off point for how we can better cater our APIs to help the UI team. I've asked @wiedld to take a look at the PRs as I push them in the next sprint or two, with the expectation they can start being consumed by the UI team in a few sprints.
Our use cases, and one possible/suggested solution.
Use Case 1:
Use Case 2:
.
my 2.00000001 cents, is to suggest that we perhaps have three methods exposed (which in combination would cover all of our use cases):
We never would need the actual semantically evaluated/transformed AST or any other representations. So we don't care about any changes internally. We only need a validation method which consumes the syntax AST.
.
We can start trying to pick up specific classes of errors. But I think exposing a semantic check result (success/fail, and error message) is the best way to future proof ourselves. Basically, give us the ability to run any check we can (as much as is possible outside of the storage layer). And leave the semantic transformations/AST as something that does not need to be under contract.
From what I understand of the PRs that @rockstar is about to work on (the direction he is heading), that should align with our needs. What do you think @rockstar @drdelambre ?
Also, another question just for my education.
The type checking requires the recordType information (storage layer information) and occurs during query planning. Does the function signatures (a.k.a. function schemas) exist outside of the storage layer? Meaning, that we can type check flux function signatures outside the storage layer?
.
Note that I am not making a feature request. Just trying to understand:
Does the function signatures (a.k.a. function schemas) exist outside of the storage layer? Meaning, that we can type check flux function signatures outside the storage layer?
Yes, and I suspect that what we're really after here is a way to validate that we're invoking these functions, mostly from the stdlib, correctly.
Agreed. We first noticed this problem with the http.endpoint(url: "badUrl")
example. But we don't want to only solve for just this use case.
and in followup after convo: the "badUrl" example will only be caught at runtime, since the function signature is only a string (there is no url type). So the use of the semantic validation will not fix this use case. But will catch other errors...such as providing a float etc in the function signature.
How this particular ticket fits into the larger story.
We are right now having to ping the query endpoint (which actually runs the query, potentially changing the DB state), in order to validate a given flux query. Some of this is unavoidable, because there are validations in the storage layer which we'll not get access to until runtime. But some of it CAN be caught in the UI level, if we have the proper methods exposed.
Current scope of work (agreed in convo with @rockstar -- please feel free to correct 😆 ):
The proposed solution remains in progress. We can either close with spike and open a ticket for the proposed solution...or wait until the solution (a new wasm lib) is available for us.
Wasm solution was already done. Closing.
A vast majority of our product and engineering strategies for the past year and near future have been built leaning into the fact that the flux team is publishing a source of truth about what valid flux means through the wasm. We use this to validate queries, to generate queries, and as a core mechanism for updating queries throughout the product. It was recently brought to our attention that there is syntax validation that is being missed when using the wasm while exploring this PR: https://github.com/influxdata/ui/pull/3916.. that the query was being generated, parsed, and reformatted through the wasm, and still failing when it hit the api. this means that there is some drift between how queryd is validating queries and how the wasm is. The learning of these differences will inform what product affordances we're allowed to offer in the future as we continue to rely more on query manipulation.