Add socketpair shim - Githubissues

tiif commented 1 month ago

Fixes #3442 Design proposal: https://hackmd.io/@tiif/Skhc1t0-C

tiif commented 1 month ago

This is not entirely ready, and still waiting for discussion to happen, but I will tag it with waiting-on-review for more visibility.

@rustbot ready

tiif commented 1 month ago

Hmm, it seems like I got into some rebase trouble. Sorry if anyone get pinged on this PR.

RalfJung commented 1 month ago

@rustbot author (at least I think that reflects the current state)

tiif commented 1 month ago

It turns out that x86_64-apple-darwin does not support SOCK_NONBLOCK and SOCK_CLOEXEC. Details are posted here https://github.com/rust-lang/miri/issues/3442#issuecomment-2130530963

But we have quite a few tasks that should be handled first, maybe we can delay the support x86_64-apple-darwin to another PR?

tiif commented 1 month ago

Don't we have to indicate EOF at some point in read? Like, when the other side got closed and all data has been read, or so?

This will also need to handle the case where the other socket was dropped. What do the manpages say about that case? Will we need to report https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.ConnectionAborted or something?

Yes we need special handling for these cases[^1], this is the new specification:

For non-blocking

read
- if write end open but no data available
  - fail with EAGAIN/EWOULDBLOCK
- if write end close
  - if buffer is not empty
    - read as much as we need/can and return number of bytes read
  - else (buffer is empty )
    - return 0
- if read size < available data
  - read as requested and return
- if read size >= available data
  - read as much as we can and return (partial read)
write
- if read end open
  - if write size <= buffer size
    - if there is available space to write all requested bytes
      - write and return
    - else (no available space to write all requested bytes)
      - fail with EAGAIN/EWOULDBLOCK
  - if write size > buffer size
    - if available space in buffer is 0
      - fail with EAGAIN/EWOULDBLOCK
    - else (available space in buffer != 0)
      - write as much as we can (partial write)
- if read end closed
  - fail with EPIPE
    For blocking
read
- if write end open but no data available
  - block
- if write end close
  - if buffer is not empty
    - read as much as we need/can and return number of bytes read
  - else (buffer is empty )
    - return 0
- if read size < available data
  - read as requested and return
- if read size >= available data
  - read as much as we can and return
write
- if read end open
  - if write size <= buffer size
    - block until there is sufficient space to write all requested bytes
  - if write size > buffer size
    - block until there is sufficient space to write all requested bytes
- if read end close
  - fail with EPIPE

[^1]: https://man7.org/linux/man-pages/man7/pipe.7.html PIPE_BUF section, I can't find exact specification for sockets, I assumed behaviour of NONBLOCKING file descriptors (sockets/pipe/FIFO) are the same here

tiif commented 1 month ago

We also need to implement close.

We can add in a new field that is being shared between 2 socketpair struct:

struct SocketPair {
    readbuf: Rc<RefCell<VecDeque<u8>>>,
    writebuf: Rc<RefCell<VecDeque<u8>>>,
    is_nonblock: bool,
    both_end_open: Rc<RefCell<bool>>,
}

This is the initial state where both side is opened,

write to fd[0]  -----> write to buffer0 ----> read from fd[1] 
read from fd[0] <----- write to buffer1 <---- write to fd[1]

When fd[1] is closed, this will be the state of the socketpair:

write to fd[0]  -----> write to buffer0 ----> X
read from fd[0] <----- write to buffer1 <---- X

fd[1] should set both_end_open to false to notify fd[0] that the other end has been dropped.

oli-obk commented 1 month ago

How is nonblocking EOF differentiated from no available bytes?

RefCell<bool>

Prefer Cell when the wrapped type is simple and Copy like bool

oli-obk commented 1 month ago

Blocking write/read will require some refactorings to allow us to park the thread and continue later. We already have the concept for sleep, something similar will be needed here. I would recommend doing so in a separate PR and just keep emitting unsupported where it would block

tiif commented 1 month ago

How is nonblocking EOF differentiated from no available bytes?

I am not sure if I fully understand your question, please clarify if needed. For nonblocking, return 0 is used to indicate EOF when the write end is closed. In this case, there is no input, and there won't be any input in future. For write end opened but no available bytes, EAGAIN is thrown to express "there is no available bytes right now, please try again later". (note: I didn't come out with this specification, I only transcribed what I read)

oli-obk commented 1 month ago

Oh yea that makes sense. I confused myself with the "bytes available" bullet points

oli-obk commented 1 month ago

We can add in a new field that is being shared between 2 socketpair struct:

That's not going to work with duplicated file descriptors, we'd need another way to detect that the last copy of one end of the pair was dropped.

What we could do instead is to use Weak instead of Rc for the writer. Once the corresponding reader is dropped, the weak cannot be upgraded to an rc anymore, signalling that the other end is gone.

This does mean you need to upgrade before every write, but that seems ok

tiif commented 1 month ago

What we could do instead is to use Weak instead of Rc for the writer. Once the corresponding reader is dropped, the weak cannot be upgraded to an rc anymore, signalling that the other end is gone.

Ah, that's neat! Since not being able to upgrade to Rc means the other side is dropped, we only need

struct SocketPair {
    readbuf: Weak<RefCell<VecDeque<u8>>>,
    writebuf: Weak<RefCell<VecDeque<u8>>>,
    is_nonblock: bool,
}

(Note: this won't be the final design, we still need to consider data race detection)

oli-obk commented 1 month ago

readbuf needs to be Rc, not Weak, one side always needs to be Rc, otherwise both Weaks are dangling and can't be upgraded. When thr last Rc gets dropped, all weak become invalid

tiif commented 1 month ago

readbuf needs to be Rc, not Weak, one side always needs to be Rc, otherwise both Weaks are dangling and can't be upgraded.

I see.

Let's say fd[1] is dropped,

(type: Weak) write to fd[0]  -----> write to buffer0 ----> X
(type: Rc) read from fd[0] <----- write to buffer1 <----  X <-- consider this direction

we could use Rc::weak_count to track if the write end is dropped?

oli-obk commented 1 month ago

we could use Rc::weak_count to track if the write end is dropped?

Yea, that would work

RalfJung commented 1 month ago

@tiif your list above is not entirely right -- when all writers are closed but there is data left in the buffer, that data should still be returned. Only then should you indicate EOF.

RalfJung commented 1 month ago

Checking the counts is suspicious, those are pretty hard to use as a signal for anything... if we ever add some other reference to this buffer then suddenly this logic will break.

Maybe it's better to just have our own has_writer flag that we track explicitly? close on a Socketpair can then set has_writer on its write buffer to false. (close will only be called when the last FD for this file description is closed.)

tiif commented 1 month ago

when all writers are closed but there is data left in the buffer, that data should still be returned.

Yes, I didn't consider this case. The list is now updated.

Maybe it's better to just have our own has_writer flag that we track explicitly? close on a Socketpair can then set has_writer on its write buffer to false. (close will only be called when the last FD for this file description is closed.)

I have been thinking about this for a whlie, but not sure how this is going to work yet.

Initial state, fd[0] is the writer of its writebuf, and fd[1] is the writer of fd[0]'s readbuf.

write to fd[0]  -----> write to buffer0 ----> read from fd[1] 
read from fd[0] <----- write to buffer1 <---- write to fd[1]

If we dup fd[0], and only consider buffer0

write to fd[0]  ----->    |----------------------|
                          | buffer0              |----> read from fd[1] 
write to newfd  ----->    |----------------------|

Closing fd[0] is a valid action and it will just make the state transit to:

write to newfd  -----> write to buffer0 ----> read from fd[1]

So I am not sure how close can only be called when the last FD for this file description is closed.

I am thinking of letting the socketpair having refcount for writers of its writebuf and readbuf.

struct SocketPair {
    readbuf: Rc<RefCell<VecDeque<u8>>>,
    writebuf: Weak<RefCell<VecDeque<u8>>>,
    is_nonblock: bool,
    // number of writer for readbuf
    readbufWriterCount: Rc<Cell<i64>>,
    // number of writer for writebuf
    writebufWriterCount: Rc<Cell<i64>>
}

write to fd[0]  ----->    |----------------------|
                          | buffer0              |----> read from fd[1] 
write to newfd  ----->    |----------------------|

For this case, writebufWriterCount of fd[0] and newfd hold the same reference as readbufWriterCount of fd[1] When fd[0] is dup to newfd, writebufWriterCount of fd[0] and newfd will increase by 1 (So readbufWriterCount of fd[1] will increase by 1 too). To check if all the write end of the readbuf is dropped, we only need to check if the value of readbufWriterCount is 0.

Since these counters is only used to track writer of buffers, it won't break when the buffer is referred for other purpose.

Not sure if I over-complicated this problem, I feel a more elegant solution exists somewhere.

RalfJung commented 1 month ago

So I am not sure how close can only be called when the last FD for this file description is closed.

This is already the case, the general file descriptor machinery takes care of that. The logic for that is here.

close on a file description is called exactly once (even if, using dup, there are many file descriptors for the same file description), and each buffer is a write buffer for exactly one file description, so you can use the closing of that file description as the indication for "all writers are gone".

tiif commented 1 month ago

close on a file description is called exactly once (even if, using dup, there are many file descriptors for the same file description), and each buffer is a write buffer for exactly one file description, so you can use the closing of that file description as the indication for "all writers are gone".

Ah I see how it works now, yes this makes sense. I confused close for file descriptor and file description.

bors commented 1 month ago

:umbrella: The latest upstream changes (presumably #3635) made this pull request unmergeable. Please resolve the merge conflicts.

tiif commented 1 month ago

@rustbot ready

oli-obk commented 1 month ago

beyond the two cleanups, this lgtm now

tiif commented 1 month ago

There is a minor change in non-blocking write specification. Basically it is not necessary to have write_size <= buffer_size and write_size > buffer_size check, so for write, so it'd be: for blocking:

    - if read end open
        - if available_space == 0
            - block
        - else 
            - write as much as we can then return number of bytes written. 
    - if read end close
        - fail with ``EPIPE``

for non-block:

    - if read end open
        - if available_space == 0
            - fail with ``EWOULDBLOCK``
        - else
            - write as much as we can and return number of bytes written
    - if read end closed
        - fail with ``EPIPE``

Full specification for reference

### For non-blocking - read - if write end open but no data available - fail with ``EWOULDBLOCK`` - if write end close - if buffer is not empty - read as much as we need/can and return number of bytes read - else (buffer is empty ) - return 0 - if write end open with data - if read size < available data - read as requested and return - if read size >= available data - read as much as we can and return (partial read) - write - if read end open - if available_space == 0 - fail with ``EWOULDBLOCK`` - else - write as much as we can and return number of bytes written - if read end closed - fail with ``EPIPE`` ### For blocking - read - if write end open but no data available - block - if write end close - if buffer is not empty - read as much as we need/can and return number of bytes read - else (buffer is empty ) - return 0 - if write end open with data - if read size < available data - read as requested and return - if read size >= available data - read as much as we can and return - write - if read end open - if available_space == 0 - block - else - write as much as we can then return number of bytes written. - if read end close - fail with ``EPIPE``

RalfJung commented 3 weeks ago

@rustbot author

tiif commented 3 weeks ago

@rustbot ready

RalfJung commented 3 weeks ago

@rustbot author

tiif commented 3 weeks ago

Thanks for the thorough review ;)

@rustbot ready

RalfJung commented 3 weeks ago

Just two more nits. :)

Please squash after resolving them.

tiif commented 3 weeks ago

Thanks, just a heads-up, I did rebase + squash again because the git log is quite a mess, I will find a better workflow soon.

RalfJung commented 3 weeks ago

@bors r+

Thanks, just a heads-up, I did rebase + squash again because the git log is quite a mess, I will find a better workflow soon.

What I usually do is look at git log to find the most recent bors commit, copy-paste that commit ID, and then do git rebase --interactive <that commit id> to squash.

bors commented 3 weeks ago

:pushpin: Commit 87324435796eebba80267ff0de1c30d3d2949640 has been approved by RalfJung

It is now in the queue for this repository.

bors commented 3 weeks ago

:hourglass: Testing commit 87324435796eebba80267ff0de1c30d3d2949640 with merge 65f3e90f9be4853278318ee0eb85e21daa815a3f...

tiif commented 3 weeks ago

Thanks, I will try that next time.

bors commented 3 weeks ago

:sunny: Test successful - checks-actions Approved by: RalfJung Pushing 65f3e90f9be4853278318ee0eb85e21daa815a3f to master...

rust-lang / miri

Add socketpair shim #3609

For non-blocking

For blocking