googleapis / google-cloud-cpp-spanner

C++ client library for Google Cloud Spanner
https://cloud.google.com/spanner/
Apache License 2.0
29 stars 14 forks source link

Implement Transaction::PartitionedDmlOptions #287

Closed devjgm closed 5 years ago

devjgm commented 5 years ago

We also need to add an overload to Client::ExecuteSql to accept an argument of type Transaction::PartitionedDmlOptions (just like how SingleUseOptions works today).

NOTE: there are no actual options today, but there might be some someday.

mr-salty commented 5 years ago

I'm not sure about this change: "Each client library provides separate methods for DML execution and Partitioned DML execution." here.

It's a strange enough beast in terms of semantics and restrictions, and has a different return type; I'd argue adding an ExecuteSql overload is against the style guide rule "You may overload a function when there are no semantic differences between variants."

devjgm commented 5 years ago

Thanks for bringing this up. You might be right.

I'm not convinced of the style guide violation because the only difference between a DML UPDATE ... and a "Partitioned" DML UPDATE ... is which transaction type is used. The return type is the same. I think that's why @devbww and I came to this other conclusion yesterday.

But I will say that I didn't realize that all the other libraries treat this one differently (even though they don't need to). That point alone may be enough to convince me to revert #467 and proceed with the advice you're suggesting.

Since I'm not in today, how about @mr-salty and @devbww have a chat, and update this issue w/ what you decide. I'm happy w/ whatever you two think is the best approach.

mr-salty commented 5 years ago

I wouldn't say that's the only difference - there's a whole lot of verbiage describing how it's not normal, including "Partitioned DML is not a drop-in replacement for standard DML used in ReadWrite transactions"

The return type could be the same but since we know there is never any data in the response (you can't read), we could return something simpler. The StatusOr<int64> came from the initial design doc, and I don't know it's origin. It seems like we might really want ResultSetStats but this says "Query plans and profiling: The gcloud command-line tool and the client libraries do not support query plans and profiling.". So, I think if we follow that we're left with just the lower bound on rows modified to return (or an error).

FYI, I'm a bit unclear on how you specify a Partitioned DML transaction. This makes it sound like it should always be put in single_use: "Partitioned DML transactions may only contain the execution of a single DML statement via ExecuteSql or ExecuteStreamingSql." although this statement "Partitioned DML requires an existing PartitionedDml transaction ID." almost makes it sound like you have to call BeginTransaction explicitly first.

devbww commented 5 years ago

I vacillated on this, but came down on the side of ...

StatusOr<ResultSet> ExecuteSql(
    Transaction::PartitionedDmlOptions transaction_options,
    SqlStatement statement);

because Execute[Streaming]Sql() is exactly what RPC we're going to send anyway. If the ResultSet is empty, so be it. The same is true for other ExecuteSql() calls. [And then we avoid an extra Connection interface.]

ResultSetStats certainly has to be able to represent the row_count_lower_bound part, but that's #217.

That said, I already need to merge my PR because of the conflicting deletion of Client::ExecutePartitionedDml(), so I'm happy to just dump it and start again when an API is agreed upon.

As for the transaction, I'm working on the theory that "requires an existing PartitionedDml transaction ID" is just a documentation simplification. That is, that there is no difference between "BeginTransaction(begin); Operation(tx-from-begin)" and "Operation(begin)" for any operation. I agree it is confusing though, given that the previous sentence reads "an existing transaction ID or begin a new transaction". I guess we should get clarification. In any case, we could make a BeginTransaction() call first in our ExecuteSql() implementation if s.begin().has_partitioned_dml(), so it wouldn't change any user API.

devbww commented 5 years ago

In the name of "actively working on", I'm going to unassign this. I'll keep the code around should thinking return to the current wording of the issue.

For the record, this is a prerequisite for #432.

coryan commented 5 years ago

Well, I think we are going to need to call BeginTransaction() API explicitly, whether we want it or not. I just wrote a test and I got this:

[ RUN      ] ClientIntegrationTest.ExecuteSqlPartitionDml
google/cloud/spanner/integration_tests/client_integration_test.cc:363: Failure
Value of: result
Expected: is OK
Actual: PartitionedDml Transactions must be created with the BeginTransaction API [INVALID_ARGUMENT]
devbww commented 5 years ago

Thanks for clarifying that.

we could make a BeginTransaction() call first in our ExecuteSql() implementation if s.begin().has_partitioned_dml(), so it wouldn't change any user API.

I'll add a marker for that in my pending PR, which I can then fill in if we decide ExecuteSql() is the right user interface.

devjgm commented 5 years ago

FTR, after talking about this I believe we decided on the following:

Opinion: Also, I'll add that due to the odd nature of PartitionedDml transactions (i.e., they must be created with BeginTransaction, and they cannot be committed), I do not think we need to or should expose anything about Partitioned DML transactions in any of the transaction files or classes. I think we can make all the Partitioned DML transaction stuff be a complete implementation detail inside connection_impl.cc.