googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.68k stars 1.26k forks source link

spanner: allow to bind spanner.Statement to a placeholder #9432

Open CAFxX opened 6 months ago

CAFxX commented 6 months ago

Is your feature request related to a problem? Please describe.

Application modularization encourages applications to keep data access logic for tables isolated from each other. This is normally good for maintainability, but has the negative consequence of encouraging to fetch resultsets just to be then sent as arguments to other queries, forcing needless roundtrips that lead to high latencies and resource consumption.

It would instead be ideal if users had ways to transparently compose complex queries out of building blocks managed by each data access module.

Describe the solution you'd like

A potential solution could be to allow spanner.Statements parameters to be spanner.Statements, e.g.

    spanner.Statement{
        SQL: "DELETE FROM table1 WHERE id IN @ids",
        Params: map[string]any{
            "ids": spanner.Statement{
                SQL: "SELECT id FROM table2 WHERE parent_id IN @ids",
                Params: map[string]any{
                    "ids": []int{ /* ... */ },
                },
            },
        },
    }

that would thus execute as

    spanner.Statement{
        SQL: `DELETE FROM table1 WHERE id IN (
            SELECT id FROM table2 WHERE parent_id IN @ids
        )`,
        Params: map[string]any{
            "ids": []int{ /* ... */ },
        },
    }

This will allow the data access logic for a table to return a resultset as a query, instead of as an actual material resultset, thereby allowing modularization to be maintained without forcing the needless materialization of datasets. (in the case above, the statement bound to placeholder ids would be returned by the data access module for table2, while the outer statement would be returned by the access module for table table1)

The only tricky bit, AFAICT, would be handling the case of duplicated placeholder names in the bound statements. In addition to the case above there could also be something like:

    spanner.Statement{
        SQL: "DELETE FROM table1 WHERE foo IN @foos AND id IN @ids",
        Params: map[string]any{
            "ids": []string{ /* ... */ },
            "foos": spanner.Statement{
                SQL: "SELECT id FROM table2 WHERE parent_id IN @ids",
                Params: map[string]any{
                    "ids": []int{ /* ... */ },
                },
            },
        },
    }

that would need to be executed as something like

    spanner.Statement{
        SQL: `DELETE FROM table1 WHERE foo in (
            SELECT id FROM table2 WHERE parent_id IN @ids$1
        ) AND id IN @ids`,
        Params: map[string]any{
            "ids": []string{ /* ... */ },
            "ids$1": []int{ /* ... */ }, // automatically renamed to handle the conflict
        },
    }

In a sense, this would allow to cleanly compose complex queries out of simpler ones, similar to CTEs.

The above could AFAICT be implemented directly in the clients, without real support needed on the spanner side in the form of new APIs or capabilities. The only potential effect on spanner AFAICT may be that of stressing the query planner, as this may lead applications to send in more complex queries.

An alternative, that would cover both this requirement as well as cover many more out of scope in this request, would be to support some form of shareable cursor (or temporary table), and allow to pass a handle to that cursor/table in a placeholder. This approach would require significant new functionality on the spanner side.

noahdietz commented 6 months ago

triaged labels, feel free to change

rahul2393 commented 6 months ago

We are discussing on this internally whether this need to be supported in client side(all 8 languages) or spanner backend. Will update here by next week about the decision.

rahul2393 commented 6 months ago

@CAFxX We feel like this would add too much complexity to our client library. The client library is intended to give users an idiomatic API for Cloud Spanner. This feature feels more like a feature that would belong in a query builder library. You could consider looking into whether go-gorm-spanner would be a better fit for (some of) your application?

Reference: https://gorm.io/docs/advanced_query.html#From-SubQuery

CAFxX commented 6 months ago

We're trying to move away and avoid ORMs as, at least in go, they seem to lead teams to write code that is either hard to read/maintain or that does not execute efficiently (and that's assuming the ORM of choice is not buggy, something that is surprisingly often not the case)