rainshowerLabs / blutgang

the wd40 of ethereum load balancers
Other
339 stars 27 forks source link

[META] Blutgang 0.3.0 Garreg Mach #57

Open makemake-kbo opened 7 months ago

makemake-kbo commented 7 months ago

This is a meta issue for general questions and troubleshooting related to Blutgang 0.3.0 Garreg Mach.

If you have trouble updating, using, getting undefined behaviour, or anything minor that does not deserve it's own issue, you can report it here.

chris-vest commented 7 months ago

@makemake-kbo Trying to run this on Kubernetes, and after checking the RPC latencies, it simply exits:

outbound-blutgang-lb-dev-f382c-1 outbound-blutgang-lb Wrn: All data cleared from the database.
outbound-blutgang-lb-dev-f382c-1 outbound-blutgang-lb Info: Starting Blutgang 0.3.0 Garreg Mach
outbound-blutgang-lb-dev-f382c-1 outbound-blutgang-lb Info: Bound to: 0.0.0.0:3000
outbound-blutgang-lb-dev-f382c-1 outbound-blutgang-lb Wrn: Reorg detected!
outbound-blutgang-lb-dev-f382c-1 outbound-blutgang-lb Removing stale entries from the cache.
outbound-blutgang-lb-dev-f382c-1 outbound-blutgang-lb Info: Adding user 1 to sink map
outbound-blutgang-lb-dev-f382c-1 outbound-blutgang-lb Info: Subscribe_user finding: ["newHeads"]
- outbound-blutgang-lb-dev-f382c-1 › outbound-blutgang-lb

Running locally with the same config using Docker, it seems to work fine. I have no liveness / readiness set on Kubernetes, so it's not being killed by the orchestrator.

    Last State:     Terminated
      Reason:       Error
      Exit Code:    132
      Started:      Wed, 21 Feb 2024 11:20:37 +0100
      Finished:     Wed, 21 Feb 2024 11:20:49 +0100
spec:
  containers:
  - args:
    - -c
    - /app/config.toml
    command:
    - /app/blutgang
makemake-kbo commented 7 months ago

@chris-vest its probably being killed by something. exiting on its own without any error code should not be happening.

chris-vest commented 7 months ago

@makemake-kbo Can you confirm the health check endpoint? I saw there was a feature added for it but I can't find the endpoint.

makemake-kbo commented 7 months ago

@chris-vest its at / on the admin api. if you get a response of {"id":0} that means its working. if you get anything else that means its unhealthy.

chris-vest commented 7 months ago

@makemake-kbo Thank you.

I don't think K8s is killing the container, it seems to error out:

    lastState:
      terminated:
        containerID: containerd://46bd3f545980b859a4be7dfa96463197fbbd8efff6ee1b407167230a86da3957
        exitCode: 132
        finishedAt: "2024-02-21T10:59:28Z"
        reason: Error
        startedAt: "2024-02-21T10:59:28Z"
chris-vest commented 7 months ago

Is there a debug log?

chris-vest commented 7 months ago

If you could provide an example config for Kubernetes so I can check against it, that would be great. I'm using the 0.3.0 image.

makemake-kbo commented 7 months ago

@chris-vest feature flagdebug-verbose prints verbose output about what its doing.

theres helm charts here https://github.com/ethpandaops/ethereum-helm-charts/tree/master/charts/blutgang you can use as reference.

chris-vest commented 7 months ago

Any idea what the 132 exit code could be?

I'm using 0.2.0 now and that seems to work fine.

makemake-kbo commented 7 months ago

If 0.2.0 works fine this is probably a regression. Could you post your config/full output?

chris-vest commented 7 months ago

redacted

chris-vest commented 7 months ago

Same config as above but I removed Quicknode and DRPC config to see if it would help, so you only see it checking one RPC latency at startup.

+ outbound-blutgang-lb-dev-f382c-0 › outbound-blutgang-lb
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Info: Using config file at /app/config.toml
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Sorting RPCs by latency...
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb https://REDACTED: 126060066.75ns
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Wrn: All data cleared from the database.
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Info: Starting Blutgang 0.3.0 Garreg Mach
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Info: Bound to: 0.0.0.0:3000
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Info: Admin namespace enabled, accepting admin methods at admin port
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Info: Bound admin to: 0.0.0.0:5715
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Wrn: Reorg detected!
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Removing stale entries from the cache.
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Info: Adding user 1 to sink map
outbound-blutgang-lb-dev-f382c-0 outbound-blutgang-lb Info: Subscribe_user finding: ["newHeads"]
- outbound-blutgang-lb-dev-f382c-0 › outbound-blutgang-lb
makemake-kbo commented 7 months ago

Could you run sudo cat /proc/cpuinfo | grep avx on the machine running k8s? Theres a chance that its erroring out with 132 because it doesn't have avx2 instructions.

chris-vest commented 7 months ago
$ sudo cat /proc/cpuinfo | grep avx2
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
makemake-kbo commented 7 months ago

That's really bizarre. Exit code 132 should mean that it encountered an illegal instruction. Just to confirm what's happening, if you run the x86_64 build from the releases page, does it still SIGILL?

chris-vest commented 7 months ago

I'll try to build an image with that x86_64 build and let you know.

makemake-kbo commented 7 months ago

@chris-vest pushed a new container with a conservative build target and verbose debug output. if error 132 doesn't happen with this one then it's probably safe to say its target cpu related.

chris-vest commented 7 months ago

@makemake-kbo Your 0.3.0-debug image seems to work! Last question I think is whether I can make the logging less verbose? I probably really only want errors. I've set both RUST_LIB_BACKTRACE and RUST_BACKTRACE to 0.

jhvst commented 7 months ago

I'm getting malformed JSON output on the websocket connection.

websocat ws://upstream:8545
> {"jsonrpc":"2.0","id":155,"method":"eth_subscribe","params":["newHeads"]}
< {"jsonrpc":"2.0","id":155,"result":"0x76540fa86762d185d54875398cf69c4b"}
websocat ws://blutgang:8545
> {"jsonrpc":"2.0","id":155,"method":"eth_subscribe","params":["newHeads"]}
< {"jsonrpc":"2.0","id":155,"result":0x76540fa86762d185d54875398cf69c4b}

As you can see, the blutgang result value is not a string for some reason, but a raw value, even though upstream reports this back correctly. This seemingly only happens in this first response with the subscribe -- all other responses thereafter are correctly relayed.

makemake-kbo commented 7 months ago

@makemake-kbo Your 0.3.0-debug image seems to work! Last question I think is whether I can make the logging less verbose? I probably really only want errors. I've set both RUST_LIB_BACKTRACE and RUST_BACKTRACE to 0.

@chris-vest awesome! ill make a new minor release with various small bug fixes and a more conservative target either later today or tomorrow. verbose debug output is a compile time feature.

I'm getting malformed JSON output on the websocket connection.

websocat ws://upstream:8545
> {"jsonrpc":"2.0","id":155,"method":"eth_subscribe","params":["newHeads"]}
< {"jsonrpc":"2.0","id":155,"result":"0x76540fa86762d185d54875398cf69c4b"}
websocat ws://blutgang:8545
> {"jsonrpc":"2.0","id":155,"method":"eth_subscribe","params":["newHeads"]}
< {"jsonrpc":"2.0","id":155,"result":0x76540fa86762d185d54875398cf69c4b}

As you can see, the blutgang result value is not a string for some reason, but a raw value, even though upstream reports this back correctly. This seemingly only happens in this first response with the subscribe -- all other responses thereafter are correctly relayed.

fixed in https://github.com/rainshowerLabs/blutgang/commit/8268e2b96a1c31f3884383cb1c8432c63ae40cbb