Open trgill opened 3 years ago
@prajnoha At first glance I was finding this a bit confusing. I get it now I think. When I was looking at:
data_spec.data = NULL;
data_spec.data_size = 0;
data_spec.ext.used = true;
It had me looking around to figure out what was being sent.
Do you think adding _chan_buf_send_socket() would be an improvement? (I'm not sure)
Would you please elaborate on the comment at:
Thing here is that "main process" <--> "worker" channels have buffers to send/receive data as a whole (and we're using our buffer
code for this). Since buffer
code is generic, it uses pure read
and write
calls, because it doesn't know anything (yet) about what type of fd
it is working with when buffer_read
and buffer_write
is called.
Now, unix domain sockets are a bit specific in way that besides the usual data (which can be normally handled by those read
and write
calls), they can also transport the ancillary data (like the file descriptors). And this is something that the generic read
/write
simply can't handle, we have to use the more specific recvmsg
/sendmsg
here.
What we do now in this case where we need to send data+ancillary data is that we read/write the buffer for data as usual. For the ancillary data (the file descriptors passed through unix domain socket), we need to do call the extra recvmsg
/sendmsg
.
To avoid reading the data and ancillary data part separately, we'd need to enhance the buffer interface so it could accept the ancillary data too (...right now, we make use of FD passing capability only). Practically, this means the functions like buffer_add
and buffer_write
/buffer_read
would need to accept extra arg for passing the ancillary data and if this is detected inside the buffer code, it would switch from using read/
writecalls inside to more specific
reacvmsg/
sendmsg` calls (or any other specific functions for other types than unix domain socket which could possibly accept such extra data...).
The change would need to happen inside buffer
code, not the _chan_buf_send
. The _chan_buf_send
is a wrapper/abstraction on top already (using buffers underneath and other related handling) and it already accepts struct worker_data_spec *data_spec
which does have the fields to pass the extra/ancillary data (e.g. the data_spec->ext.socked.fd_pass
). So this already works fine up to this point. Then, however, since the buffer doesn't have the interface/capability to accept ancillary data, we need to do 2 separate calls - one for data and one for the extra bit.
However, such change (adding support for ancillary data in buffer code) is questionable, because right now it's just that FD passing through unix domain sockets where we need this 'extra data passing` functionality. So it's questionable whether it would be worth the change:
we'd gain possibility to send the data and extra/ancillary data in one go through buffers (this is probably the only advantage I can think of)
BUT we'd make the buffer interface a bit more complicated and prone to errors (...what if someone tries to attach an FD to pass through with the buffer data but we're not writing the buffer through the unix domain socket? So there are extra cases to check for and return proper errors etc.).
Would you please elaborate on the comment at:
...yeah. Currently, if you look at how we handle the data reception, we actually do (example with comms between main process and worker):
we register _on_worker_channel_event
callback with event loop on worker side and _on_worker_proxy_channel_event
on main process side for data reception through the channel
when there's data to receive, the event loop will notice and it'll call the registered callback
the callback will then read the data, but it doesn't need to be complete!
the callback will return and if event loop still sees more data to receive on the channel, it'll call the callback again and again until it's completely handled
The advantage here is that the event loop can still process another parts in between this "data reception callback" call, calling another callbacks for different other registered events.
Now, we don't do this for sending out the data - we have our own inner loops that try to send the data and they're looping up until the data is completely send (...or erroring out if there's an error) - the actual loop is inside the buffer_write_all
call:
This means that if this "inner" loop would take longer for whatever reason, we'd be stalling all the other events registered with the "official" event loop. Besides that, this is also about having a clean solution where if we go through the event loop during data reception, we should do the same for data sending too so the solution for handling data transmission is symmetrical and not creating "exceptions" where one direction is handled one way and the other one is handled using a different way. This can be a source of wrong assumptions and related bugs in the future...
Address FIXME in code:
https://github.com/sid-project/sid/blob/cf867f83c90a46aab3f9434896ff818325d9fbac/src/resource/worker-control.c#L942