Open kbr- opened 3 years ago
What is the use case that you are having in mind? Can it be solved by having two RPCs? One that pass rpc::source and another that does not. The call that pass rpc::source and return rpc::sink was intended to allow opening stream session and associate some data with it.
-- Gleb.
The optional would be used to implement the following algorithm:
I do this in https://github.com/scylladb/scylla/pull/8169. I worked around the lack of optional by always sending the sink and additional info so node A can see that the condition is false as well - in that case it simply won't use the source.
It could be solved by two RPCs too, one for sending the metadata and checking the condition, the other for sending the sink.
For sending a sink inside a struct, imagine a use case like this: we want to send some "large" data structure that is made of a sequence of "small" parts, each part has the same type. Additionally, we want to send some "small" metadata that is of different type. We could do it by sending a struct that contains the small metadata and a sink, and then send the large data structure through the sink by sending these small parts. But I don't have anything specific in mind at this point.
Can obviously be solved by two RPCs - one for sending the small metadata and one for sending the large data structure - but if the two things are somehow logically bundled why not bundle them "physically" as well.
Currently this use case is covered by rpc::tuple
at least, but I don't see a reason why structs couldn't work as well (except that implementation could be tricky).
Alternatively, we could modify the type of the sink, be it sink<variant<small metadata type, large struct fragment>>. Then the sender would first send the metadata and then large struct fragments. But that's ugly because we know that metadata will only come once as the first thing and then everything else are large struct fragments. We lose type safety etc.
On Mon, Mar 01, 2021 at 02:30:07AM -0800, Kamil Braun wrote:
The optional would be used to implement the following algorithm:
- Node A sends node B some metadata
- Node B checks some condition based on the received metadata:
- if the condition is true, B sends a sink to A (which A receives as a source) and streams data to A
- otherwise B sends nullopt
Just send a status back, or send an error through the stream. The stream was already created - you gain nothing by trying to hide the flag in an optional. One of rpc stream features that was not yet implemented was "one way" stream where a receiver of a stream does not send its part of the stream back which indicates that the stream will be one sided and another side will be automatically closed. Trying to play games like you want to do here will just make it harder.
- If A receives a sink, it consumes the streamed data; otherwise it knows that the condition did not pass and we abandon the operation
I do this in https://github.com/scylladb/scylla/pull/8169. I worked around the lack of optional by always sending the sink and additional info so node A can see that the condition is false as well - in that case it simply won't use the source.
It could be solved by two RPCs too, one for sending the metadata and checking the condition, the other for sending the sink.
For sending a sink inside a struct, imagine a use case like this: we want to send some "large" data structure that is made of a sequence of "small" parts, each part has the same type. Additionally, we want to send some "small" metadata that is of different type. We could do it by sending a struct that contains the small metadata and a sink, and then send the large data structure through the sink by sending these small parts. But I don't have anything specific in mind at this point.
The rpc stream was literally added for your "large" data structure example.
Can obviously be solved by two RPCs - one for sending the small metadata and one for sending the large data structure - but if the two things are somehow logically bundled why not bundle them "physically" as well. Currently this use case is covered by
rpc::tuple
at least, but I don't see a reason why structs couldn't work as well (except that implementation could be tricky).
rpc::tuple is making return values if RPC calls extensible while maintaining backwards compatibility. It was needed as a way to move from variadic futures that supported it naturally. It is not to make some return values optional.
Alternatively, we could modify the type of the sink, be it sink<variant<small metadata type, large struct fragment>>. Then the sender would first send the metadata and then large struct fragments. But that's ugly because we know that metadata will only come once as the first thing and then everything else are large struct fragments. We lose type safety etc.
If you want to send completely different things use different rpc calls. You only obfuscate things by trying to bundle the unbundable.
Theoretically it would be nice to be able to put sink/source anywhere deep in a struct hierarchy just for academic purposes, but then you need to expose rpc internals to user provided serializes and any example usage will be as contrived as your example here.
-- Gleb.
The rpc stream was literally added for your "large" data structure example.
The problem here is the "small" metadata that I want to send together with the "large" data struct. It would be nice to do something like (on the sending end):
struct all_data {
small_metadata_t small_metadata;
rpc::sink<large_structure_fragment_t> large_structure_sink;
};
and on the receiving end:
struct all_data {
small_metadata_t small_metadata;
rpc::source<large_structure_fragment_t> large_structure_sink;
};
rpc::tuple is making return values if RPC calls extensible while maintaining backwards compatibility. It was needed as a way to move from variadic futures that supported it naturally. It is not to make some return values optional.
I can also use it to solve the problem above, by sending
rpc::tuple<small_metadata_t, rpc::sink<large_structure_fragment_t>>
What I'm asking is whether we could generalize it: there's nothing special about tuples compared to usual structs.
On Mon, Mar 01, 2021 at 03:06:31AM -0800, Kamil Braun wrote:
The rpc stream was literally added for your "large" data structure example.
The problem here is the "small" metadata that I want to send together with the "large" data struct. It would be nice to do something like (on the sending end):
struct all_data { small_metadata_t small_metadata; rpc::sink<large_structure_fragment_t> large_structure_sink; };
and on the receiving end:
struct all_data { small_metadata_t small_metadata; rpc::source<large_structure_fragment_t> large_structure_sink; };
What is the problem in sending metadata and sink as two different rpc parameters? This is exactly how opening rpc connection was intended to be done. Why do you think putting sink/source into the struct has such large value that rpc internals need to be exposed to a user code and idl compiler added special handling for rpc streams.
rpc::tuple is making return values if RPC calls extensible while maintaining backwards compatibility. It was needed as a way to move from variadic futures that supported it naturally. It is not to make some return values optional.
I can also use it to solve the problem above, by sending
rpc::tuple<small_metadata_t, rpc::sink<large_structure_fragment_t>>
. What I'm asking is whether we could generailize it: there's nothing special about tuples compared to usual structs.
Don't. It will break horribly when you will want to add one more member to that tuple.
-- Gleb.
What is the problem in sending metadata and sink as two different rpc parameters?
The problem is to send metadata and sink to the client. To send it to the server, I can use two parameters as you said. In other words, the problem is to put both the metadata and sink in the return type of the handler.
Don't. It will break horribly when you will want to add one more member to that tuple.
Then I won't add more members. But I don't understand why it would break?
Example:
env.register_handler(1, [] (rpc::source<> source, client_metadata_t client_metadata) -> future<rpc::tuple<rpc::sink<large_struct_fragment_t>, server_metadata_t>> {
// use client metadata...
auto sink = source.make_sink<...>(...);
server_metadata_t server_metadata {...};
return make_ready_future<...>(rpc::tuple{std::move(sink), std::move(server_metadata)});
});
as you can see, to return both a sink and the metadata, I had to put them in rpc::tuple
.
The problem is to generalize this so I could send e.g. optional<sink>
or whatever struct containing a sink.
On Mon, Mar 01, 2021 at 03:21:55AM -0800, Kamil Braun wrote:
What is the problem in sending metadata and sink as two different rpc parameters?
The problem is to send metadata and sink to the client. To send it to the server, I can use two parameters as you said. In other words, the problem is to put both the metadata and sink in the return type of the handler.
Ah, I see what you mean now. You want to return value to contain source + something else.
Don't. It will break horribly when you will want to add one more member to that tuple.
Then I won't add more members. But I don't understand why it would break?
The assumption is that you will :) I probably misunderstood what you what to use rpc::tuple for. If you want to use it to return multiple values there is no problem doing so and this is what it intended for. If you want to use it to make returning rpc::source optional then you have the problem of adding new members.
-- Gleb.
Now I'm doing something like:
// server:
env.register_handler(1, [] (rpc::source<> source, client_metadata_t client_metadata) -> future<rpc::tuple<rpc::sink<large_struct_fragment_t>, bool>> {
// use client metadata...
auto sink = source.make_sink<...>(...);
bool is_sink_valid = ...
return make_ready_future<...>(rpc::tuple{std::move(sink), is_sink_valid});
});
// client:
auto handler = make_client(...)
auto [source, is_source_valid] = co_await handler(...)
if (is_source_valid) {
// use source...
} else {
// do something else...
}
so: because I can't return optional<sink>
, I return both sink
and bool
, and the bool
indicates if the sink is "really there". It seems like an ugly hack to me: because I can't express the optionality of the sink in the type system, I need to use a separate flag to denote whether the sink is valid or phony.
Also, the server must create the sink unconditionally, even if it sends is_sink_valid = false
.
It would be nice not to have to do hacks like this.
On Mon, Mar 01, 2021 at 03:35:47AM -0800, Kamil Braun wrote:
Now I'm doing something like:
// server: env.register_handler(1, [] (rpc::source<> source, client_metadata_t client_metadata) -> future<rpc::tuple<rpc::sink<large_struct_fragment_t>, bool>> { // use client metadata... auto sink = source.make_sink<...>(...); bool is_sink_valid = ... return make_ready_future<...>(rpc::tuple{std::move(sink), is_sink_valid}); }); // client: auto handler = make_client(...) auto [source, is_source_valid] = co_await handler(...) if (is_source_valid) { // use source... } else { // do something else... }
so: because I can't return
optional<sink>
, I return bothsink
andbool
, and thebool
indicates if the sink is "really there". It seems like an ugly hack to me: because I can't express the optionality of the sink in the type system, I need to use a separate flag to denote whether the sink is valid or phony.You do not have to do it like that. You can have an RPC that checks the validity and then another one that creates the stream. This also saves you the stream creation in the first place.
Also, the server must create the sink unconditionally, even if it sends
is_sink_valid = false
.The stream is already created, creating a sink is just a formality. With support for one way stream it will not have to be.
The way you want to do things is to create the stream (additional tcp connection and all that) before you do the check and then try to save on insignificant operation that creates sink object in memory. If you had open coded it instead of using RPC you would never do it like that.
-- Gleb.
You do not have to do it like that. You can have an RPC that checks the validity and then another one that creates the stream. This also saves you the stream creation in the first place.
I'd much prefer to have only 1 RPC verb instead of 2. No particular reason, just personal preference. I think it's more elegant to do it in one call. Well, it also saves a round-trip (but that's perhaps not so important here).
The stream is already created, creating a sink is just a formality. With support for one way stream it will not have to be.
The way you want to do things is to create the stream (additional tcp connection and all that) before you do the check and then try to save on insignificant operation that creates sink object in memory. If you had open coded it instead of using RPC you would never do it like that.
Yeah, it would be nice to somehow prevent the stream creation as well. But even if we already created the stream there is value in not sending the sink - type safety. If we don't send the sink, the other side won't use it by accident.
On Mon, Mar 01, 2021 at 05:50:18AM -0800, Kamil Braun wrote:
Yeah, it would be nice to somehow prevent the stream creation as well. It is very easy - do not create one until you know you need it.
-- Gleb.
Could you provide a snippet of code that sketches the solution? In particular, I'm interested in the type(s) of the handler(s).
On Mon, Mar 01, 2021 at 06:11:46AM -0800, Kamil Braun wrote:
Could you provide a snippet of code that sketches the solution? In particular, I'm interested in the type(s) of the handler(s).
Do not create an rpc stream until you verified that you will use it. What king of sketch do you need for that?
if (need_stream) { sink = create one source = rpc_send(sink); }
?
-- Gleb.
How does the client (the one executing the snippet you provided) know the value of need_stream
?
In my previous snippet, it was the server who can calculate need_stream
(it was called is_sink_valid
back then).
Could you include a snippet with the full (high-level) algorithm of the client? Including how he obtains need_stream
.
Note: we can assume that the server knows how to calculate need_stream
.
On Mon, Mar 01, 2021 at 06:18:00AM -0800, Kamil Braun wrote:
How does the client (the one executing the snippet you provided) know the value of
need_stream
? In whatever way you want it to. You know the logic, I have no idea. The point is it is done before you open a stream, not after.
-- Gleb.
I already explained the logic: the client doesn't know need_stream
. Only the server does. So the server knows whether or not to open a stream.
So we have a contradiction:
The only way I see to resolve this is to have the client contact the server twice:
need_stream
,need_stream == true
).Unfortunately, there's a problem here: between 1 and 2 need_stream
may change (on the server side).
On Mon, Mar 01, 2021 at 07:37:50AM -0800, Kamil Braun wrote:
The only way I see to resolve this is to have the client contact the server twice:
- once to obtain
need_stream
,- the second time to obtain the source (if
need_stream == true
).Unfortunately, there's a problem here: between 1 and 2
need_stream
may change.
Good point. You can make server create the stream instead (call back to the client after it connects), or make version check as part of data that goes through the stream. Anyway you cannot make returning sink optional anyway. It has to be closed explicitly by the stream initiator.
-- Gleb.
(call back to the client after it connects)
That's interesting, but meh. IMO too complicated, I'd rather create the throw-away stream: the operation is not so common anyway, so the additional wasted connection doesn't make a difference; also the version mismatch is actually an exceptional situation.
or make version check as part of data that goes through the stream
This could be done by having something like
rpc::sink<std::variant<version, large_struct_fragment>>
Then sending first the version, then large struct fragments. But that's ugly :( We know during compile time that version comes only once, then everything else is fragments. But because of the APIs we need to dispatch on the variant on each received element.
Anyway you cannot make returning sink optional anyway.
:(
I modified tests/unit/rpc_test.cc and implemented a simple
std::optional
(de)serializer:The following test demonstrates its use:
However, I'd like to do something more complex: I'd like to send
optional<rpc::sink>
/ receiveoptional<rpc::source>
through an RPC call. Unfortunately, the code doesn't compile. For example:gives:
and
Indeed. I don't know how to (de)serialize
sink
s. Only the internals of the framework know how to do it. In include/seastar/rpc/rpc_impl.hh,unmarshal_one::helper<sink<T...>>
:and
marshall_one::helper<sink<T...>>
:I could copy-paste the serializing code and implement
write
for sinks,, but I can't copy-paste the deserializing code because it needs to access the connection in order to obtain the stream using the deserialized connection ID.Now, these internal sink (de)serializing functions are called only if the return type of the handler is directly a sink, or it's wrapped in an
rpc::tuple
; if it's wrapped inrpc::tuple
, then the compiler will dispatch frommarshall_one::helper<tuple<sink<T...>, ...>>
tomarhsall_one::helper<sink<T...>>
and so on.But if the sink is wrapped in anything else, say
std::optional
or some user-defined struct, there is no way to call back the internal (de)serializing functions after the compiler dispatches into calling user'sserializer
.What I need is sending
optional<sink>
. I tried to userpc::optional<sink>
, but there is no serializer written forrpc::optional
insidemarshall_one
, and for a good reason: it cannot be done correctly. Indeed, the deserializer forrpc::optional
does the following (unmarshal_one::helper<optional<T>>
):I.e. it checks whether it's not nullopt by checking
in.size() > 0
. That is, it assumes that there is no trailing data after the optional. It would be then incorrect to send e.g.rpc::tuple<rpc::optional<int32_t>, int32_t>
where the first element isnullopt
, because in that casein.size() > 0
would be true.One potential solution would be to create a new optional type inside the framework which (de)serializes by using a boolean that indicates whether it's a nullopt, as shown at the beginning of the post. Then
unmarshal_one::helper<optional_v2<sink>>
could dispatch tounmarshal_one::helper<sink>
.Another solution, much more general, would be to somehow allow calling back the sink (de)serialization code. This would allow e.g. putting sinks into user-defined structures and sending these. But I don't know if it's possible without some huge refactors.
cc @gleb-cloudius perhaps you have some idea?