protocol / beyond-bitswap

Other
34 stars 9 forks source link

Issues with running the testbed transfer plan and RFC experiments #46

Open ebma opened 2 years ago

ebma commented 2 years ago

Hey there. I'm having issues with running the testground test plans. I followed the instructions in the testbed README to install everything but have issues with running the transfer testplan and the RFC experiment test plans.

When running the single_run.sh script, the nodes are hanging at "waiting for network to initialize" and the test is timing out with "context deadline exceeded". (I modified the script to use RUNNER="local:docker" and BUILDER="docker:go" but exec also does not work for me.)

Step 28/28 : ENTRYPOINT [ "/testplan"]
 ---> Running in a644b3cfb5ba
 ---> 900b91fa9f0c
 ---> 900b91fa9f0c
Successfully built 900b91fa9f0c
Successfully tagged 26327f500441:latest
Jul  9 17:59:44.075299  INFO    build completed {"default_tag": "26327f500441:latest", "took": "38s"}
Jul  9 17:59:44.128650  INFO    got docker image id     {"image_id": "900b91fa9f0c"}
Jul  9 17:59:44.181896  INFO    tagging image   {"image_id": "900b91fa9f0c", "tag": "tg-plan-testbed:900b91fa9f0c"}
Jul  9 17:59:44.187222  INFO    build succeeded {"plan": "testbed", "groups": ["single"], "builder": "docker:go", "artifact": "900b91fa9f0c"}
Jul  9 17:59:44.187312  INFO    performing healthcheck on runner
Jul  9 17:59:44.194357  INFO    container isn't running; starting       {"container_name": "testground-grafana"}
Jul  9 17:59:44.582836  INFO    container isn't running; starting       {"container_name": "testground-influxdb"}
Jul  9 17:59:44.933611  WARN    some healthchecks failed, but continuing
Jul  9 17:59:44.933668  INFO    starting run    {"run_id": "cb4s3q3ft8ahdoftmlb0", "plan": "testbed", "case": "transfer", "runner": "local:docker", "instances": 5}
Jul  9 17:59:44.968129  INFO    creating container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "name": "tg-testbed-transfer-cb4s3q3ft8ahdoftmlb0-single-0"}
Jul  9 17:59:45.085128  INFO    creating container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "name": "tg-testbed-transfer-cb4s3q3ft8ahdoftmlb0-single-1"}
Jul  9 17:59:45.129046  INFO    creating container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "name": "tg-testbed-transfer-cb4s3q3ft8ahdoftmlb0-single-2"}
Jul  9 17:59:45.169380  INFO    creating container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "name": "tg-testbed-transfer-cb4s3q3ft8ahdoftmlb0-single-3"}
Jul  9 17:59:45.198084  INFO    creating container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "name": "tg-testbed-transfer-cb4s3q3ft8ahdoftmlb0-single-4"}
Jul  9 17:59:45.229807  INFO    starting containers     {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "count": 5}
Jul  9 17:59:45.229876  INFO    starting container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "id": "8fb6df27ae5e1ecae505ed6634837ed3ad1be118c3a83d7bbe87ca5e3c0e253e", "group": "single", "group_index": 0}
Jul  9 17:59:45.229873  INFO    starting container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "id": "965d0784d7ae662642d72f95fc560746531ebde38f74821fa65ad547b239887f", "group": "single", "group_index": 4}
Jul  9 17:59:45.229873  INFO    starting container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "id": "4f0e47e5a6af633e5c265866dde91b3828e0922509b8a5d105f85bdc1f58bd02", "group": "single", "group_index": 3}
Jul  9 17:59:45.229878  INFO    starting container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "id": "b86bc557c07dd42f91e21c45a179d894db0a2b57f20bfaa3881012337328c22c", "group": "single", "group_index": 1}
Jul  9 17:59:45.229923  INFO    starting container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "id": "93a8aeb43e8ef9293dac1465e8b261bfbfe591263d4c91cd3516c845957cfa89", "group": "single", "group_index": 2}
Jul  9 17:59:45.954264  INFO    0.7244s      OTHER << sidecar      >> Jul  9 17:59:45.953718    INFO    resolved route to host  {"host": "testground-redis", "ip": "192.18.0.3"}
Jul  9 17:59:45.954327  INFO    0.7245s      OTHER << sidecar      >> Jul  9 17:59:45.953920    INFO    resolved route to host  {"host": "testground-sync-service", "ip": "192.18.0.4"}
Jul  9 17:59:45.954344  INFO    0.7245s      OTHER << sidecar      >> Jul  9 17:59:45.954105    INFO    resolved route to host  {"host": "testground-influxdb", "ip": "192.18.0.6"}
Jul  9 17:59:45.954711  INFO    0.7245s    MESSAGE << sidecar      >> InfluxDB unavailable; no metrics will be dispatched: no InfluxDB URL in $INFLUXDB_URL env var
Jul  9 17:59:45.958716  INFO    0.7288s      OTHER << sidecar      >> Jul  9 17:59:45.958406    INFO    successfully resolved route to host     {"container_id": "8fb6df27ae5e1ecae505ed6634837ed3ad1be118c3a83d7bbe87ca5e3c0e253e"}
Jul  9 17:59:45.968329  INFO    0.7371s    MESSAGE << single[000] (8fb6df) >> registering default http handler at: http://[::]:6060/ (pprof: http://[::]:6060/debug/pprof/)
Jul  9 17:59:45.968576  INFO    0.7371s      START << single[000] (8fb6df) >> {"plan":"testbed","case":"transfer","run":"cb4s3q3ft8ahdoftmlb0","outputs_path":"/outputs","instances":5,"params":{"bandwidth_mb":"150","bstore_delay_ms":"5","data_dir":"false","dialer":"default","disk_store":"false","enable_dht":"false","enable_providing":"false","enable_tcp":"100","exchange_interface":"bitswap","file_size":"15728640","input_data":"files","jitter_pct":"10","latency_ms":"10","leech_count":"3","long_lasting":"false","max_connection_rate":"0","node_type":"ipfs","number_waves":"1","parallel_gen_mb":"100","passive_count":"","request_stagger":"0","run_count":"2","run_timeout_secs":"90000","seeder_rate":"100","timeout_secs":"400000"},"group":"single","group_instances":5,"network":"16.12.0.0/16","start_time":"0001-01-01T00:00:00Z"}
Jul  9 17:59:45.973722  INFO    0.7438s      OTHER << sidecar      >> Jul  9 17:59:45.973444    INFO    external routing disabled
Jul  9 17:59:45.973806  INFO    0.7439s      OTHER << sidecar      >> Jul  9 17:59:45.973498    INFO    waiting for all networks to be ready    {"sidecar": true, "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 17:59:45.973959  INFO    0.7438s    MESSAGE << single[000] (8fb6df) >> waiting for network to initialize
Jul  9 17:59:46.296012  INFO    1.0659s    MESSAGE << sidecar      >> InfluxDB unavailable; no metrics will be dispatched: no InfluxDB URL in $INFLUXDB_URL env var
Jul  9 17:59:46.299944  INFO    1.0698s    MESSAGE << single[001] (b86bc5) >> registering default http handler at: http://[::]:6060/ (pprof: http://[::]:6060/debug/pprof/)
Jul  9 17:59:46.300083  INFO    1.0698s      START << single[001] (b86bc5) >> {"plan":"testbed","case":"transfer","run":"cb4s3q3ft8ahdoftmlb0","outputs_path":"/outputs","instances":5,"params":{"bandwidth_mb":"150","bstore_delay_ms":"5","data_dir":"false","dialer":"default","disk_store":"false","enable_dht":"false","enable_providing":"false","enable_tcp":"100","exchange_interface":"bitswap","file_size":"15728640","input_data":"files","jitter_pct":"10","latency_ms":"10","leech_count":"3","long_lasting":"false","max_connection_rate":"0","node_type":"ipfs","number_waves":"1","parallel_gen_mb":"100","passive_count":"","request_stagger":"0","run_count":"2","run_timeout_secs":"90000","seeder_rate":"100","timeout_secs":"400000"},"group":"single","group_instances":5,"network":"16.12.0.0/16","start_time":"0001-01-01T00:00:00Z"}
Jul  9 17:59:46.301369  INFO    1.0710s    MESSAGE << single[001] (b86bc5) >> waiting for network to initialize
Jul  9 17:59:46.446754  INFO    1.2168s      OTHER << sidecar      >> Jul  9 17:59:46.445748    INFO    successfully resolved route to host     {"container_id": "b86bc557c07dd42f91e21c45a179d894db0a2b57f20bfaa3881012337328c22c"}
Jul  9 17:59:46.491011  INFO    1.2610s    MESSAGE << sidecar      >> InfluxDB unavailable; no metrics will be dispatched: no InfluxDB URL in $INFLUXDB_URL env var
Jul  9 17:59:46.496605  INFO    1.2664s    MESSAGE << single[002] (93a8ae) >> registering default http handler at: http://[::]:6060/ (pprof: http://[::]:6060/debug/pprof/)
Jul  9 17:59:46.496757  INFO    1.2664s      START << single[002] (93a8ae) >> {"plan":"testbed","case":"transfer","run":"cb4s3q3ft8ahdoftmlb0","outputs_path":"/outputs","instances":5,"params":{"bandwidth_mb":"150","bstore_delay_ms":"5","data_dir":"false","dialer":"default","disk_store":"false","enable_dht":"false","enable_providing":"false","enable_tcp":"100","exchange_interface":"bitswap","file_size":"15728640","input_data":"files","jitter_pct":"10","latency_ms":"10","leech_count":"3","long_lasting":"false","max_connection_rate":"0","node_type":"ipfs","number_waves":"1","parallel_gen_mb":"100","passive_count":"","request_stagger":"0","run_count":"2","run_timeout_secs":"90000","seeder_rate":"100","timeout_secs":"400000"},"group":"single","group_instances":5,"network":"16.12.0.0/16","start_time":"0001-01-01T00:00:00Z"}
Jul  9 17:59:46.497784  INFO    1.2677s    MESSAGE << single[002] (93a8ae) >> waiting for network to initialize
Jul  9 17:59:46.508462  INFO    1.2784s    MESSAGE << sidecar      >> InfluxDB unavailable; no metrics will be dispatched: no InfluxDB URL in $INFLUXDB_URL env var
Jul  9 17:59:46.512022  INFO    1.2819s    MESSAGE << single[003] (4f0e47) >> registering default http handler at: http://[::]:6060/ (pprof: http://[::]:6060/debug/pprof/)
Jul  9 17:59:46.512133  INFO    1.2819s      START << single[003] (4f0e47) >> {"plan":"testbed","case":"transfer","run":"cb4s3q3ft8ahdoftmlb0","outputs_path":"/outputs","instances":5,"params":{"bandwidth_mb":"150","bstore_delay_ms":"5","data_dir":"false","dialer":"default","disk_store":"false","enable_dht":"false","enable_providing":"false","enable_tcp":"100","exchange_interface":"bitswap","file_size":"15728640","input_data":"files","jitter_pct":"10","latency_ms":"10","leech_count":"3","long_lasting":"false","max_connection_rate":"0","node_type":"ipfs","number_waves":"1","parallel_gen_mb":"100","passive_count":"","request_stagger":"0","run_count":"2","run_timeout_secs":"90000","seeder_rate":"100","timeout_secs":"400000"},"group":"single","group_instances":5,"network":"16.12.0.0/16","start_time":"0001-01-01T00:00:00Z"}
Jul  9 17:59:46.512874  INFO    1.2829s    MESSAGE << single[003] (4f0e47) >> waiting for network to initialize
Jul  9 17:59:46.515609  INFO    1.2857s      OTHER << sidecar      >> Jul  9 17:59:46.515389    INFO    successfully resolved route to host     {"container_id": "93a8aeb43e8ef9293dac1465e8b261bfbfe591263d4c91cd3516c845957cfa89"}
Jul  9 17:59:46.515665  INFO    1.2858s      OTHER << sidecar      >> Jul  9 17:59:46.515429    INFO    external routing disabled
Jul  9 17:59:46.521781  INFO    1.2919s      OTHER << sidecar      >> Jul  9 17:59:46.515445    INFO    successfully resolved route to host     {"container_id": "4f0e47e5a6af633e5c265866dde91b3828e0922509b8a5d105f85bdc1f58bd02"}
Jul  9 17:59:46.521840  INFO    1.2920s      OTHER << sidecar      >> Jul  9 17:59:46.515653    INFO    waiting for all networks to be ready    {"sidecar": true, "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 17:59:46.521861  INFO    1.2920s      OTHER << sidecar      >> Jul  9 17:59:46.515979    INFO    external routing disabled
Jul  9 17:59:46.521872  INFO    1.2920s      OTHER << sidecar      >> Jul  9 17:59:46.516001    INFO    external routing disabled
Jul  9 17:59:46.521881  INFO    1.2920s      OTHER << sidecar      >> Jul  9 17:59:46.516056    INFO    waiting for all networks to be ready    {"sidecar": true, "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 17:59:46.521890  INFO    1.2920s      OTHER << sidecar      >> Jul  9 17:59:46.516057    INFO    waiting for all networks to be ready    {"sidecar": true, "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 17:59:46.542666  INFO    started containers      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "count": 5}
Jul  9 17:59:46.544171  INFO    1.3140s    MESSAGE << sidecar      >> InfluxDB unavailable; no metrics will be dispatched: no InfluxDB URL in $INFLUXDB_URL env var
Jul  9 17:59:46.544949  INFO    1.3151s      OTHER << sidecar      >> Jul  9 17:59:46.544804    INFO    successfully resolved route to host     {"container_id": "965d0784d7ae662642d72f95fc560746531ebde38f74821fa65ad547b239887f"}
Jul  9 17:59:46.545183  INFO    1.3153s      OTHER << sidecar      >> Jul  9 17:59:46.544991    INFO    external routing disabled
Jul  9 17:59:46.545233  INFO    1.3154s      OTHER << sidecar      >> Jul  9 17:59:46.545031    INFO    waiting for all networks to be ready    {"sidecar": true, "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 17:59:46.545571  INFO    1.3157s      OTHER << sidecar      >> Jul  9 17:59:46.545424    INFO    all networks ready      {"sidecar": true, "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 17:59:46.545762  INFO    1.3159s      OTHER << sidecar      >> Jul  9 17:59:46.545439    INFO    all networks ready      {"sidecar": true, "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 17:59:46.545791  INFO    1.3159s      OTHER << sidecar      >> Jul  9 17:59:46.545455    INFO    all networks ready      {"sidecar": true, "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 17:59:46.545801  INFO    1.3159s      OTHER << sidecar      >> Jul  9 17:59:46.545455    INFO    all networks ready      {"sidecar": true, "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 17:59:46.545809  INFO    1.3160s      OTHER << sidecar      >> Jul  9 17:59:46.545639    INFO    all networks ready      {"sidecar": true, "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 17:59:46.549816  INFO    1.3196s    MESSAGE << single[004] (965d07) >> registering default http handler at: http://[::]:6060/ (pprof: http://[::]:6060/debug/pprof/)
Jul  9 17:59:46.549959  INFO    1.3196s      START << single[004] (965d07) >> {"plan":"testbed","case":"transfer","run":"cb4s3q3ft8ahdoftmlb0","outputs_path":"/outputs","instances":5,"params":{"bandwidth_mb":"150","bstore_delay_ms":"5","data_dir":"false","dialer":"default","disk_store":"false","enable_dht":"false","enable_providing":"false","enable_tcp":"100","exchange_interface":"bitswap","file_size":"15728640","input_data":"files","jitter_pct":"10","latency_ms":"10","leech_count":"3","long_lasting":"false","max_connection_rate":"0","node_type":"ipfs","number_waves":"1","parallel_gen_mb":"100","passive_count":"","request_stagger":"0","run_count":"2","run_timeout_secs":"90000","seeder_rate":"100","timeout_secs":"400000"},"group":"single","group_instances":5,"network":"16.12.0.0/16","start_time":"0001-01-01T00:00:00Z"}
Jul  9 17:59:46.551114  INFO    1.3210s    MESSAGE << single[004] (965d07) >> waiting for network to initialize
Jul  9 18:09:05.237973  ERROR   context.Background.WithDeadline(2022-07-09 20:09:05.236917778 +0200 CEST m=+615.193319498 [-1.052737ms])        {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0"}
Jul  9 18:09:05.237992  INFO    560.0081s INCOMPLETE << single[001] (b86bc5) >> 
Jul  9 18:09:05.238023  INFO    560.0082s      ERROR << single[002] (93a8ae) >> context deadline exceeded
Jul  9 18:09:05.238005  INFO    560.0081s      ERROR << single[001] (b86bc5) >> context deadline exceeded
Jul  9 18:09:05.238031  INFO    560.0082s INCOMPLETE << single[002] (93a8ae) >> 
Jul  9 18:09:05.237974  INFO    deleting containers     {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "ids": ["8fb6df27ae5e1ecae505ed6634837ed3ad1be118c3a83d7bbe87ca5e3c0e253e", "b86bc557c07dd42f91e21c45a179d894db0a2b57f20bfaa3881012337328c22c", "93a8aeb43e8ef9293dac1465e8b261bfbfe591263d4c91cd3516c845957cfa89", "4f0e47e5a6af633e5c265866dde91b3828e0922509b8a5d105f85bdc1f58bd02", "965d0784d7ae662642d72f95fc560746531ebde38f74821fa65ad547b239887f"]}
Jul  9 18:09:05.238038  INFO    560.0082s      ERROR << single[000] (8fb6df) >> context deadline exceeded
Jul  9 18:09:05.238048  INFO    560.0082s INCOMPLETE << sidecar      >> 
Jul  9 18:09:05.238055  INFO    560.0082s      ERROR << single[004] (965d07) >> context deadline exceeded
Jul  9 18:09:05.238054  INFO    560.0082s INCOMPLETE << single[000] (8fb6df) >> 
Jul  9 18:09:05.238062  INFO    560.0082s INCOMPLETE << single[004] (965d07) >> 
Jul  9 18:09:05.238067  INFO    560.0082s INCOMPLETE << single[003] (4f0e47) >> 
Jul  9 18:09:05.238082  INFO    deleting container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "id": "965d0784d7ae662642d72f95fc560746531ebde38f74821fa65ad547b239887f"}
Jul  9 18:09:05.238082  INFO    deleting container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "id": "8fb6df27ae5e1ecae505ed6634837ed3ad1be118c3a83d7bbe87ca5e3c0e253e"}
Jul  9 18:09:05.238082  INFO    deleting container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "id": "b86bc557c07dd42f91e21c45a179d894db0a2b57f20bfaa3881012337328c22c"}
Jul  9 18:09:05.238089  INFO    deleting container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "id": "4f0e47e5a6af633e5c265866dde91b3828e0922509b8a5d105f85bdc1f58bd02"}
Jul  9 18:09:05.238090  INFO    deleting container      {"runner": "local:docker", "run_id": "cb4s3q3ft8ahdoftmlb0", "id": "93a8aeb43e8ef9293dac1465e8b261bfbfe591263d4c91cd3516c845957cfa89"}
Jul  9 18:09:05.238055  INFO    560.0082s      ERROR << single[003] (4f0e47) >> context deadline exceeded
Jul  9 18:09:06.951796  WARN    run finished in error   {"run_id": "cb4s3q3ft8ahdoftmlb0", "plan": "testbed", "case": "transfer", "runner": "local:docker", "instances": 5, "error": "context deadline exceeded"}
Jul  9 18:09:06.951987  ERROR   doRun returned err      {"err": "context deadline exceeded"}
Jul  9 18:09:06.959021  INFO    worker completed task   {"worker_id": 0, "task_id": "cb4s3q3ft8ahdoftmlb0"}

When running an RFC experiment with ./run_experiment.sh rfcBBL102 I get the following error:

Step 19/26 : RUN cp /tmp/go.mod /tmp/go.sum ${PLAN_DIR}/
 ---> Running in 101d095bf46b
 ---> 57f8c2ec9daf
Step 20/26 : RUN cd ${PLAN_DIR}     && go env -w GOPROXY="${GO_PROXY}"     && CGO_ENABLED=${CgoEnabled} GOOS=linux GOARCH=amd64 go build -o ${PLAN_DIR}/testplan.bin ${BUILD_TAGS} ${TESTPLAN_EXEC_PKG}
 ---> Running in 4a7bffce0cbd
utils/bitswap.go:9:2: missing go.sum entry for module providing package github.com/ipfs/go-bitswap (imported by github.com/protocol/beyond-bitswap/testbed/testbed/utils); to add:
        go get github.com/protocol/beyond-bitswap/testbed/testbed/utils
utils/bitswap.go:10:2: missing go.sum entry for module providing package github.com/ipfs/go-bitswap/network (imported by github.com/protocol/beyond-bitswap/testbed/testbed/utils); to add:
        go get github.com/protocol/beyond-bitswap/testbed/testbed/utils
Jul  6 17:20:04.942631  INFO    build failed    {"plan": "testbed", "groups": ["nodes"], "builder": "docker:go", "error": "docker build failed: The command '/bin/sh -c cd ${PLAN_DIR}     && go env -w GOPROXY=\"${GO_PROXY}\"     && CGO_ENABLED=${CgoEnabled} GOOS=linux GOARCH=amd64 go build -o ${PLAN_DIR}/testplan.bin ${BUILD_TAGS} ${TESTPLAN_EXEC_PKG}' returned a non-zero code: 1"}
Jul  6 17:20:04.942677  ERROR   doRun returned err      {"err": "docker build failed: The command '/bin/sh -c cd ${PLAN_DIR}     && go env -w GOPROXY=\"${GO_PROXY}\"     && CGO_ENABLED=${CgoEnabled} GOOS=linux GOARCH=amd64 go build -o ${PLAN_DIR}/testplan.bin ${BUILD_TAGS} ${TESTPLAN_EXEC_PKG}' returned a non-zero code: 1"}

I tried to fix the issues with go mod tidy but it didn't help. I'm locally running go version 1.18.3 but since I'm using the dockers runners this should not matter I guess (?)

I would highly appreciate help and tips pointing me in the right direction.