metalbear-co / mirrord

Connect your local process and your cloud environment, and run local code in cloud conditions.
https://mirrord.dev
MIT License
3.69k stars 100 forks source link

TCP Steal/mirror doesn't work with shared sockets #864

Closed aviramha closed 1 week ago

aviramha commented 1 year ago

Bug Description

When a parent process binds a socket, then children sockets listen/accept on that socket mirrord doesn't intercept those calls because the fd isn't managed (doesn't exist in our fork). Using this issue to document use cases where this happens for now:

Steps to Reproduce

from fastapi import FastAPI
import sys
app = FastAPI()

@app.get("/")
async def root():
    print("cake")
    return {"message": "Hello World"}

@app.get("/hello/{name}")
async def say_hello(name: str):
    print("pake")
    return {"message": f"Hello {name}"}

run with mirrord exec --target pod/pod uvicorn -- --reload --port 80 main:app

Backtrace

No response

Relevant Logs

No response

Your operating system and version

macOS

Local process

python

Local process version

No response

Additional Info

No response

aviramha commented 1 year ago

Possible solutions:

  1. hooking fork/exec, then passing down the SOCKET/FILES fd struct as environment variable for fds that aren't CLOEXEC
  2. Use shared memory (shmem) for SOCKET/FILES (dunno about this..)
  3. on listen with missing fd (this happens on uvicorn + reloader case) use getsockname to resolve addr and then decide if to add the socket and backfill the information. This is pretty good approach but has some caveats:
    • It would work only on the listen case, not if listen is called in parent and accept in child (not sure if this happens and what happens in this case)
    • the socket information we would get from getsockname would be the "fake port" and not the one the remote would need to listen for, so maybe this is impossible ;(
aviramha commented 1 year ago

Please let us know if this is impacting anyone so we can know if we need to prioritize it.

danielloader commented 11 months ago

This would probably affect me if I were not already blocked by this issue

yoav-orca commented 9 months ago

I'm not blocked by this, but it does affect my productivity. MacOS M1 Pro

aviramha commented 9 months ago

I'm not blocked by this, but it does affect my productivity.

MacOS M1 Pro

Hey, does it still happen to you on latest version?

yoav-orca commented 9 months ago

I'm running 3.75.0 and I'm not getting traffic with --reload and I do get traffic without it

aviramha commented 7 months ago

Also doesn't work when using Uvicorn with more than one worker process.

surbas commented 7 months ago

Happening to me on 3.84.1. I am not using the operator. Not sure if that matters.

brandfocus commented 2 months ago

Ran into the same issue. Might be worth flagging this in the docs somewhere until its fixed

meowjesty commented 1 month ago

I've been exploring this, and here are a few solutions I can think of:

The main issue we have is that our sockfds are stored in a per-layer global SOCKETS, which are not being shared between forks. So the proposal here is to just move the SOCKETS global to the intproxy, and change every operation that calls SOCKETS.[get|take](&sockfd) to use make_proxy_request_with_response (or something similar).

A problem with this solution is that now every sockfd becomes shared between execs and forks, which is not ideal. Another (minor) problem is that we must manually remember to call insert after take, as some operations call SOCKETS.take to modify the socket, meh.

Instead of moving the whole SOCKETS to intproxy, whenever we detect a fork (new layer connection?), we send a copy of the SOCKETS minus the ones with CLOEXEC to the intproxy, which in turn sets the new layer connection's SOCKETS to this curated copy.

Delegate more work to the intproxy, by moving the socket/ops there, and creating some sort of MirrordSocket API that keeps an association between a sockfd and the pid that created it, so we can avoid leaking sockfds.

That's it, so far. I'm leaning more towards the :donkey: solution, as :earth_americas: is a bug waiting to happen, and :peach: is too big of a pointless refactor.

aviramha commented 1 month ago

🫏 is what would make most sense and is how it works in terms of OS level too. We can actually pass all SOCKETS downstream and iterate each fd and see if it's still open (so we'll just follow OS logic if cloexec or not)

aviramha commented 3 weeks ago

Fixed in #2610

meowjesty commented 2 weeks ago

Looks like the problem is still there on macOS + vscode.