metalbear-co / mirrord

Connect your local process and your cloud environment, and run local code in cloud conditions.
https://mirrord.dev
MIT License
3.81k stars 104 forks source link

Application crashes on outgoing DNS requests/traffic #2271

Open klingenm opened 8 months ago

klingenm commented 8 months ago

Bug Description

Our Nest.js application crashes seemingly a bit randomly.

We think it is related to having an active WebSocket connection from frontend -> backend where backend is running in mirrord.

Looks like it is triggered if the DNS query is done while there is a long running >5 seconds stolen request being processed in the local process.

{
  "kube_context": "cloud-dev-APP",
  "target": "deployment/APP-backend",
  "operator": false,
  "pause": true,
  "agent": {
    "ephemeral": true,
    "log_level": "mirrord=trace"
  },
  "internal_proxy": {
    "log_level": "mirrord_intproxy=trace",
    "log_destination": "/tmp/internal_proxy.log"
  },
  "feature": {
    "network": {
      "incoming": {
        "mode": "steal",
        "port_mapping": [[46000, 3000]]
      }
    },
    "env": {
      "override": {
        "TYPEORM_MIGRATIONS_RUN": "true",
        "PRETTY_PRINT": "true"
      }
    }
  }
}

Steps to Reproduce

  1. start app with mirrord
  2. start an active WebSocket connection via agent to app
  3. trigger another request to backend (maybe required that a new db connection is created)

Backtrace

Not from same run as the int_proxy logs, I lost those...

2024-02-23T15:28:35.490170Z ERROR ThreadId(02) mirrordlayer::error: Error occured in Layer >> ProxyError(CodecError(IoError(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })))
Assertion failed: (!"unknown EAI* error code"), function uv__getaddrinfo_translate_error, file getaddrinfo.c, line 90.

Tested on 3.90.0

Relevant Logs

internal_proxy.log

Your operating system and version

macOS Sonoma 14.2.1 (23C71)

Local process

nodejs

Local process version

... node/v18.17.1/bin/node: Mach-O 64-bit executable arm64

Additional Info

It most often seems to crash when it tries to connect to the database.

The database is run in a separate namespace in the k8s cluster. An ExternalName with name "db" is created in APP's namespace, which is used in app config.

klingenm commented 8 months ago

I started digging into why it works for one team but not the other. My hypotesis about the websocket does not hold water. I did not know, but the other team is now actually using WebSockets more extensively than the one having the issues.

What stands out is that on the problematic page, one graphql query is done that returns a 6MB response which takes more than 5 seconds to complete and in general it makes more grapqhl queries than should be needed.

klingenm commented 8 months ago

More digging; as can be seen from the logs, the error is related to DNS query. In our case it can be completely avoided if we configure the db connection with the db server IP instead of the "db" name, thus avoiding the DNS query.

I'm open for screen-sharing session to show you reproduction, but I'll need time to set up an environment.

aviramha commented 8 months ago

Thank you! we're still investigating the logs.

Razz4780 commented 7 months ago

Fixed with #2308

Razz4780 commented 7 months ago

Observed again

kaiba42 commented 4 months ago

+1 on this issue. I'm running kubernetes in an Orbstack Linux VM on MacOS. Mirrord worked flawlessly with Orbstack v1.5.1 and has this issue with v1.6.x. DNS queries are being routed through Orbstack to kube-dns to resolve cluster internal names like temporal-frontend.temporal.svc.cluster.local. Running nslookup these names resolve to the IP of the Orbstack's VM.

aviramha commented 4 months ago

hi @kaiba42 , can you share the crash log you're having?

kaiba42 commented 3 months ago

hey apologies I missed this. Here is the error I'm getting on v3.82.0:

New mirrord version available: 3.108.0. To update, run: `"curl -fsSL https://raw.githubusercontent.com/metalbear-co/mirrord/main/scripts/install.sh | bash"`.
To disable version checks, set env variable MIRRORD_CHECK_VERSION to 'false'.
When targeting multi-pod deployments, mirrord impersonates the first pod in the deployment.
Support for multi-pod impersonation requires the mirrord operator, which is part of mirrord for Teams.
You can get started with mirrord for Teams at this link: https://mirrord.dev/docs/teams/introduction/
⠐ mirrord exec
    ✓ Update to 3.108.0 available
    ✓ ready to launch process
      ✓ layer extracted
      ✓ operator not found
      ✓ agent pod created
      ✓ pod is ready                                                                                2024-07-12T06:53:27.205007Z ERROR ThreadId(01) mirrord_layer::error: Error occured in Layer >> ProxyError(CodecError(IoError(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })))
thread 'main' panicked:
failed to connect to db: Conn(SqlxError(Io(Custom { kind: Uncategorized, error: "failed to lookup address information: Unknown error" })))
aviramha commented 3 months ago

I am pretty sure you need privileged flag can use -p in cli or in config:

{"agent": {"privileged": true}}