mefellows / muxy

Chaos engineering tool for simulating real-world distributed system failures
MIT License
823 stars 31 forks source link

Can not start muxy proxy with a network_shape configuration. #23

Closed yangyongzhi7 closed 5 years ago

yangyongzhi7 commented 6 years ago

My muxy.config:

proxy:
  - name: http_proxy
    config:
      host: 0.0.0.0
      port: 8527
      protocol: http
      proxy_host: 192.168.8.228
      proxy_port: 8527
      proxy_protocol: http
  - name: tcp_proxy
    config:
      host: 0.0.0.0           # Local address to bind to and accept connections. May be an IP/hostname
      port: 9527              # Local port to bind to
      proxy_host: 192.168.8.228     # Proxy server port
      proxy_port: 9527        # Proxied server port
      nagles_algorithm: true  # Use Nagles algorithm?
      packet_size: 64         # Size of each contiguous network packet to proxy
middleware:
  - name: logger
    config:
      hex_output: false
  - name: delay
    config:
      request_delay: 2000
      response_delay: 1500
  - name: network_shape
    config:
      latency:     1000        # Latency to add in ms
      target_bw:   10         # Bandwidth in kbits/s
      packet_loss: 0.9         # Packet loss, as a %
      target_ips:              # Target ipv4 IP addresses/CIDRs
        - "0.0.0.0/0"
      target_ips6:             # Target ipv6 IP addresses
        - "::1/128"
      target_ports:            # Target destination ports
        - "9527"               # - "1:65535"            # Ranges also valid
      target_protos:           # Target protocols
        - "tcp"
        - "udp"
        - "imp"

command

sudo ./muxy_bin proxy --config muxy_middleware.yml

logs

2018/08/21 10:38:45.999520 [INFO]       Loading plugin  logger
2018/08/21 10:38:45.999540 [INFO]       Loading plugin  delay
2018/08/21 10:38:45.999549 [INFO]       Loading plugin  network_shape
2018/08/21 10:38:45.999706 [INFO]       Loading proxy   http_proxy
2018/08/21 10:38:45.999733 [INFO]       Loading proxy   tcp_proxy
2018/08/21 10:38:45.999741 [DEBUG]      Delay Symptom - Setup()
2018/08/21 10:38:45.999748 [DEBUG]      NetworkShaperSymptom - Setup()

and Muxy process exited.

I viewed the source code, I think that this code will has a panic:

network_shape.go - 62

executeThrottler(&s.config)

how can I solve this problem? thanks.

yangyongzhi7 commented 6 years ago

I debug this code, and got a error:

func addRootQDisc(cfg *Config, c commander) error {
    //Add the root QDisc
    log.Debug(cfg.Device)
    root := fmt.Sprintf(tcRootQDisc, cfg.Device)
    strs := []string{tcAddQDisc, root, "htb", tcRootExtra}
    cmd := strings.Join(strs, " ")
    log.Debug(cmd)

    return c.execute(cmd)
}

output

2018/08/21 14:46:08.458131 [DEBUG]      eth0
2018/08/21 14:46:08.458140 [DEBUG]      sudo tc qdisc add dev eth0 handle 10: root htb default 1

I execute this command directly, and got a error too:

root@appframe205 muxy]# sudo tc qdisc add dev eth0 handle 10: root htb default 1
RTNETLINK answers: File exists
yangyongzhi7 commented 6 years ago

I debug the final executed command like this:

sudo tc qdisc add dev eth0 parent 10:10 handle 100: netem delay 1000ms rate 20kbit loss 0.90%

What is "rate"?
Usage: ... netem [ limit PACKETS ]
                 [ delay TIME [ JITTER [CORRELATION]]]
                 [ distribution {uniform|normal|pareto|paretonormal} ]
                 [ drop PERCENT [CORRELATION]]
                 [ corrupt PERCENT [CORRELATION]]
                 [ duplicate PERCENT [CORRELATION]]
                 [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]

It looks like that this command has syntax error ?

mefellows commented 6 years ago

Thanks for the report. Just an aside, you need to use three backticks (`) for your code gates, otherwise it doesn't format properly. I've updated your comments to aid readability.

In terms of the bug, I'll need to dig into that command as I've forgotten the details of how it runs.

Also, please provided details of your operating environment so we can track versions of OS etc.

yangyongzhi7 commented 6 years ago

First of all thank you for the formatting problem! and I removed the 'rate' parameter from the command and it will start successfully. comment out the following code: at vendor\github.com\tylertreat\comcast\throttler\tc.go

func addNetemRule(cfg *Config, c commander) error {
    ......

    if cfg.TargetBandwidth > -1 {
        // If you used 'rate' in netem, it will has an error at centos.
        //strs = append(strs, fmt.Sprintf(tcRate, cfg.TargetBandwidth))
    }

    ......
}

Operating system

Linux appframe205 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

CentOS release 6.10 (Final)

If you think of my modification method can be used, I'm very happy to submit a pr. Thanks again!

yangyongzhi7 commented 6 years ago

By the way, the version I run here is obtained by compiling the master branch.

mycargus commented 5 years ago

Muxy doesn't work for us with a network_shape config either. We're running muxy v0.0.5 in docker on MacOSX 10.4.4.

Here's the config:

---
## Test configuration name. Used for reporting.
name: Upstream Packet Loss

## Test Description. Used for reporting
description: A failing upstream network

## Specify log output level
##
## Log Levels supported:
## Trace (0), Debug (1), Info (2, Default), Warn (3), Error (4), Fatal (5)
loglevel: 0

# Configures a proxy to forward and/or mess with your HTTP requests
proxy:
  - name: http_proxy
    config:
      host: 0.0.0.0
      port: 8181
      # proxy_host, i.e. the service in front of which we will run muxy
      proxy_host: http://mock-api
      proxy_port: 8080

# Proxy plugins
middleware:
  - name: network_shape
    config:
      packet_loss: 0.9         # Packet loss, as a %
      target_ports:            # Target destination ports
        - "8181"
      target_protos:           # Target protocols
        - "tcp"

  # Log in/out messages
  - name: logger

Here are the muxy container logs:

2019/05/02 14:42:43.137492 Setting up PKI for 'localhost'...
2019/05/02 14:42:43.832121 [INFO]       Loading plugin  network_shape
2019/05/02 14:42:43.832160 [INFO]       Loading plugin  logger
2019/05/02 14:42:43.832228 [INFO]       Loading proxy   http_proxy
2019/05/02 14:42:43.832276 [DEBUG]      NetworkShaperSymptom - Setup()
2019/05/02 14:42:43.858740 [INFO]       HTTP proxy listening on http://0.0.0.0:8181
2019/05/02 14:42:43.858878 [TRACE]      sudo tc qdisc show | grep "netem"
sudo tc qdisc add dev eth0 handle 10: root htb default 1
sudo tc class add dev eth0 parent 10: classid 10:1 htb rate 1000000kbit
sudo tc class add dev eth0 parent 10: classid 10:10 htb rate 1000000kbit
sudo tc qdisc add dev eth0 parent 10:10 handle 100: netem loss 0.90%!
(MISSING)Packet rules setup...
Run `sudo tc -s qdisc` to double check
Run `muxy --device eth0 --stop` to reset

Of course there's a chance our config is invalid. 😄

mefellows commented 5 years ago

Hi @mycargus - long time no see!

Thanks, yes I recall this being an issue. Let me see if I can repro again and get to the bottom of it.

Weird the issue is happening in Docker though, I only noticed the issue on OSX previously.

mefellows commented 5 years ago

I've created a small branch to test out an alternative approach. See https://github.com/mefellows/muxy/releases/tag/v0.0.6 to test the binary out.

If this proves to be working better, I'll get thiss into mainline.

The main issue is that comcast isn't really designed for lib use in this way - its use of os.Exit() is problematic.

mycargus commented 5 years ago

Haha long time indeed. :) Thanks so much for working on this, man! I'll try it out tomorrow (I'm away from my computer today) and share feedback here.

mycargus commented 5 years ago

By the way, we're using muxy at Instructure now. Thanks for yet another excellent testing tool!

mefellows commented 5 years ago

Oh that's so cool! I'd love to hear a bit more about it next time we chat. In the meanwhile, I'll take another look this weekend across the different OS's. The main change you should see above is that any notices/errors will start to be logged (some always, some others at TRACE level only).

So if it does fail, we'll at least find out why.

mycargus commented 5 years ago

Alrighty, I tried out the 0.0.6 muxy release in a docker container. Here are the container logs:

2019/05/10 19:51:57.242275 Setting up PKI for 'localhost'...
2019/05/10 19:51:57.917398 [INFO]       Loading plugin  network_shape
2019/05/10 19:51:57.917449 [INFO]       Loading plugin  logger
2019/05/10 19:51:57.917544 [INFO]       Loading proxy   http_proxy
2019/05/10 19:51:57.917573 [DEBUG]      NetworkShaperSymptom - Setup()
2019/05/10 19:51:57.962878 [INFO]       HTTP proxy listening on http://0.0.0.0:8181
2019/05/10 19:51:57.962881 [TRACE]      sudo tc qdisc show | grep "netem"
sudo tc qdisc add dev eth0 handle 10: root htb default 1
sudo tc class add dev eth0 parent 10: classid 10:1 htb rate 1000000kbit
sudo tc class add dev eth0 parent 10: classid 10:10 htb rate 1000000kbit
sudo tc qdisc add dev eth0 parent 10:10 handle 100: netem loss 0.90%!
(MISSING)Packet rules setup...
Run `sudo tc -s qdisc` to double check
Run `muxy --device eth0 --stop` to reset

I then ran sudo tc -s qdisc as suggested:

docker run --rm --privileged muxy bash -c "sudo tc -s qdisc"
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

I'm not sure what to think of it. Lemme know if I can provide any other info that might help!

mefellows commented 5 years ago

I've just got it running now in a local Docker environment - things seem to be working nicely. I think it's fixed on OSX now also.

  1. I've added curl and python to the container to do some testing
  2. Build container: docker build -t muxy .
  3. Run container
    docker run \
    -p 8181:8181 --name muxy \
    -it \
    --rm \
    -v /tmp/muxy_test/muxy_linux_amd64:/opt/muxy/bin/muxy \
    -v /tmp/muxy_test/config.yml:/opt/muxy/conf/config.yml \
    --privileged \
    muxy bash
  4. Create a few interactive shells i.e. docker exec -it muxy bash
  5. python -m SimpleHTTPServer 8080 in any directory as a proxy target
  6. run ./bin/muxy proxy --config ./conf/config.yml with the below config
  7. in another Docker terminal, run time curl -I localhost:8181 to hit the proxy and witness the delay
  8. Terminate the muxy process and run the curl again to observe conditions resetting

Dockerfile:

FROM debian:latest
MAINTAINER Matt Fellows <m@onegeek.com.au>

RUN apt-get update && apt-get install -y wget unzip iptables iproute net-tools sudo
RUN mkdir -p /opt/muxy/bin

WORKDIR /opt/muxy

ENV PATH /opt/muxy/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# For testing / demo purposes
RUN apt-get install -y netcat curl python
VOLUME ["/opt/muxy/conf"]

CMD ["muxy", "proxy", "--config", "/opt/muxy/conf/config.yml"]

Muxy config:

---
name: Upstream Packet Loss

description: A failing upstream network
loglevel: 0

# Configures a proxy to forward and/or mess with your HTTP requests
proxy:
  - name: http_proxy
    config:
      host: 0.0.0.0
      port: 8181
      proxy_host: localhost
      proxy_port: 8080 # proxy the python process

# Proxy plugins
middleware:
  - name: http_tamperer
    config:
      response:
        status: 500

  - name: network_shape
    config:
      packet_loss: 40 # Packet loss
      target_ports: # Target destination ports
        - "8181"
        - "8080"
      target_protos: # Target protocols
        - "tcp"
      device: "lo" # shape only the local network, not the default of eth0

  # Log in/out messages
  - name: logger

screenshot where you can see the proxy working (manipulating response code) and also the network shape taking effect:

Screen Shot 2019-05-11 at 3 16 01 pm

mefellows commented 5 years ago

I also just noticed in your previous comments that you ran docker run --rm --privileged muxy bash -c "sudo tc -s qdisc" after running the muxy image - if I understand correctly, you've run that in another docker container which won't have those rules applied - hence the response from qdisc doesn't show any rules.

You might want to test the command against the running container i.e. something more like docker exec --it <container id> bash -c "sudo tc -s qdisc"

mefellows commented 5 years ago

I've just pushed out an updated v6.0.0 binary pre-release if you'd like to try.

mycargus commented 5 years ago

Doh! Good catch, sorry about that. Trying now.

mycargus commented 5 years ago

Works great!

Adding the device: "lo" config did the trick.

Thanks for your help!

mefellows commented 5 years ago

Fantastic! I wonder if the loopback device will catch others too? I think I added it to the README in the most recent commit but will double check that. I'd love to hear how you're using Muxy and if there is anything I should be adding to make it better / more useful.