Open heni02 opened 2 weeks ago
fixed
2.0-dev commit:ff4db5805 cn还是有panic错误
panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x3b45c0f]
goroutine 12517 gp=0xc00f46ae00 m=0 mp=0x851b900 [running]: panic({0x4706080?, 0x831fc30?}) /usr/local/go/src/runtime/panic.go:804 +0x168 fp=0xc040e06a68 sp=0xc040e069b8 pc=0x47bc88 runtime.panicmem(...) /usr/local/go/src/runtime/panic.go:262 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:900 +0x359 fp=0xc040e06ac8 sp=0xc040e06a68 pc=0x47e339 github.com/matrixorigin/matrixone/pkg/frontend.(TxnHandler).GetServerStatus(0x57dc478?) /go/src/github.com/matrixorigin/matrixone/pkg/frontend/txn.go:751 +0x2f fp=0xc040e06b08 sp=0xc040e06ac8 pc=0x3b45c0f github.com/matrixorigin/matrixone/pkg/frontend.ExecRequest.func1() /go/src/github.com/matrixorigin/matrixone/pkg/frontend/mysql_cmd_executor.go:3100 +0x173 fp=0xc040e06ba0 sp=0xc040e06b08 pc=0x3ab6fd3 panic({0x4706080?, 0x831fc30?}) /usr/local/go/src/runtime/panic.go:785 +0x132 fp=0xc040e06c50 sp=0xc040e06ba0 pc=0x47bc52 runtime.panicmem(...) /usr/local/go/src/runtime/panic.go:262 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:900 +0x359 fp=0xc040e06cb0 sp=0xc040e06c50 pc=0x47e339 github.com/matrixorigin/matrixone/pkg/vm/process.(Process).ReplaceTopCtx(...) /go/src/github.com/matrixorigin/matrixone/pkg/vm/process/process2.go:158 github.com/matrixorigin/matrixone/pkg/frontend.doComQuery(0xc023677208, 0xc0286dcc80, 0xc0416fae00) /go/src/github.com/matrixorigin/matrixone/pkg/frontend/mysql_cmd_executor.go:2861 +0x49e fp=0xc040e07238 sp=0xc040e06cb0 pc=0x3ab29fe github.com/matrixorigin/matrixone/pkg/frontend.ExecRequest(0xc023677208, 0xc0286dcc80, 0xc040e07b88) /go/src/github.com/matrixorigin/matrixone/pkg/frontend/mysql_cmd_executor.go:3127 +0x7a5 fp=0xc040e075b8 sp=0xc040e07238 pc=0x3ab5245 github.com/matrixorigin/matrixone/pkg/frontend.(Routine).handleRequest(0xc02a1f8600, 0xc040e07b88) /go/src/github.com/matrixorigin/matrixone/pkg/frontend/routine.go:298 +0x61d fp=0xc040e07a28 sp=0xc040e075b8 pc=0x3aff31d github.com/matrixorigin/matrixone/pkg/frontend.(RoutineManager).Handler(0xc00104a780, 0xc012e10000, {0xc044bcc000, 0x7ffa5, 0x7ffa5}) /go/src/github.com/matrixorigin/matrixone/pkg/frontend/routine_manager.go:385 +0x327 fp=0xc040e07c40 sp=0xc040e07a28 pc=0x3b048c7 github.com/matrixorigin/matrixone/pkg/frontend.(MOServer).handleRequest(0xc00ee3a320, 0xc012e10000) /go/src/github.com/matrixorigin/matrixone/pkg/frontend/server.go:516 +0x1eb fp=0xc040e07d10 sp=0xc040e07c40 pc=0x3b11beb github.com/matrixorigin/matrixone/pkg/frontend.(MOServer).handleMessage(0xc00ee3a320, {0x57dc4b0, 0xc000b99680}, 0xc012e10000) /go/src/github.com/matrixorigin/matrixone/pkg/frontend/server.go:484 +0x94 fp=0xc040e07de8 sp=0xc040e07d10 pc=0x3b11854 github.com/matrixorigin/matrixone/pkg/frontend.(MOServer).handleLoop(0xc00ee3a320?, {0x57dc4b0?, 0xc000b99680?}, 0xc000ba5380?) /go/src/github.com/matrixorigin/matrixone/pkg/frontend/server.go:212 +0x2f fp=0xc040e07ea8 sp=0xc040e07de8 pc=0x3b0e22f github.com/matrixorigin/matrixone/pkg/frontend.(MOServer).handleConn(0xc00ee3a320, {0x57dc4b0, 0xc000b99680}, {0x5822a98?, 0xc007712c98?}) /go/src/github.com/matrixorigin/matrixone/pkg/frontend/server.go:208 +0x4a6 fp=0xc040e07fa8 sp=0xc040e07ea8 pc=0x3b0e006 github.com/matrixorigin/matrixone/pkg/frontend.(MOServer).startAccept.gowrap2() /go/src/github.com/matrixorigin/matrixone/pkg/frontend/server.go:177 +0x30 fp=0xc040e07fe0 sp=0xc040e07fa8 pc=0x3b0dad0 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc040e07fe8 sp=0xc040e07fe0 pc=0x484d41 created by github.com/matrixorigin/matrixone/pkg/frontend.(MOServer).startAccept in goroutine 917 /go/src/github.com/matrixorigin/matrixone/pkg/frontend/server.go:177 +0x165
panic 日志: panic.log
集群yaml文件:
apiVersion: core.matrixorigin.io/v1alpha1
kind: MatrixOneCluster
metadata:
name: nightly-regression-dis
namespace: mo-ben-nightly-48c7e1698-20241103
spec:
semanticVersion: 1.3.0
dn:
exportToPrometheus: true
nodeSelector:
tke.matrixorigin.io/mo-nightly-regression: "true"
overlay:
initContainers:
- image: ccr.ccs.tencentyun.com/matrixone-dev/matrixone:nightly-ff4db5805
command:
- sh
- -c
- |
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_fin_timeout=30
imagePullPolicy: Always
name: setsysctl
terminationMessagePolicy: File
securityContext:
capabilities:
add: ["NET_ADMIN","SYS_ADMIN"]
podAnnotations:
profiles.grafana.com/memory.scrape: "true"
profiles.grafana.com/memory.port: "6060"
profiles.grafana.com/cpu.scrape: "true"
profiles.grafana.com/cpu.port: "6060"
imagePullSecrets:
- name: tke-registry
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/local-pv
operator: Exists
env:
- name: GOMEMLIMIT
value: "35000MiB"
- name: GOTRACEBACK
value: crash
- name: GOGC
value: "200"
shareProcessNamespace: true
cacheVolume:
size: 50Gi
storageClassName: directpv-min-io
sharedStorageCache:
memoryCacheSize: 5Gi
diskCacheSize: 50Gi
config: |
[dn.Txn.Storage]
backend = "TAE"
log-backend = "logservice"
[log]
level = "info"
format = "json"
max-size = 512
[dn.Ckp]
flush-interval = "60s"
min-count = 100
scan-interval = "5s"
incremental-interval = "60s"
global-interval = "100000s"
[dn.LogtailServer]
rpc-max-message-size = "16KiB"
rpc-payload-copy-buffer-size = "16KiB"
rpc-enable-checksum = true
logtail-collect-interval = "2ms"
logtail-response-send-timeout = "10s"
max-logtail-fetch-failure = 5
[observability]
metricUpdateStorageUsageInterval = "15m"
enableStmtMerge = true
enableMetricToProm = true
[dn.GCCfg]
disable-gc = true
[dn.rpc]
max-message-size = "1000M"
replicas: 1
resources:
requests:
cpu: 14
memory: 55Gi
limits:
cpu: 14
memory: 55Gi
imageRepository: ccr.ccs.tencentyun.com/matrixone-dev/matrixone
imagePullPolicy: IfNotPresent
logService:
exportToPrometheus: true
nodeSelector:
tke.matrixorigin.io/mo-nightly-regression-log: "true"
overlay:
podAnnotations:
profiles.grafana.com/memory.scrape: "true"
profiles.grafana.com/memory.port: "6060"
profiles.grafana.com/cpu.scrape: "true"
profiles.grafana.com/cpu.port: "6060"
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/mo-nightly-regression-log
operator: Exists
imagePullSecrets:
- name: tke-registry
env:
- name: GOTRACEBACK
value: crash
shareProcessNamespace: true
replicas: 3
resources:
requests:
cpu: 2
memory: 12Gi
limits:
cpu: 3
memory: 14Gi
sharedStorage:
s3:
endpoint: https://cos.ap-guangzhou.myqcloud.com
region: ap-guangzhou
path: mo-nightly-gz-1308875761/mo-benchmark-1148034539
s3RetentionPolicy: Delete
secretRef:
name: tke-regression
pvcRetentionPolicy: Delete
volume:
size: 100Gi
storageClassName: cbs-hssd
config: |
[log]
level = "info"
format = "json"
max-size = 512
[observability]
metricUpdateStorageUsageInterval = "15m"
enableStmtMerge = true
enableMetricToProm = true
tp:
exportToPrometheus: true
nodeSelector:
tke.matrixorigin.io/mo-nightly-regression: "true"
overlay:
initContainers:
- image: ccr.ccs.tencentyun.com/matrixone-dev/matrixone:nightly-ff4db5805
command:
- sh
- -c
- |
apt update -y;
apt install -y iptables conntrack;
iptables -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT;
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_fin_timeout=30
imagePullPolicy: Always
name: enable-conntrack
terminationMessagePolicy: File
securityContext:
capabilities:
add: ["NET_ADMIN","SYS_ADMIN"]
mainContainerSecurityContext:
capabilities:
add: ["NET_ADMIN","NET_RAW"]
podAnnotations:
profiles.grafana.com/memory.scrape: "true"
profiles.grafana.com/memory.port: "6060"
profiles.grafana.com/cpu.scrape: "true"
profiles.grafana.com/cpu.port: "6060"
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/local-pv
operator: Exists
imagePullSecrets:
- name: tke-registry
env:
- name: GOMEMLIMIT
value: "25000MiB"
- name: GOTRACEBACK
value: crash
- name: GOGC
value: "200"
- name: GODEBUG
value: madvdontneed=1,gctrace=2
args:
- -profile-interval=30s
- -debug-http=0.0.0.0:6060
shareProcessNamespace: true
cacheVolume:
size: 3000Gi
storageClassName: directpv-min-io
sharedStorageCache:
memoryCacheSize: 12Gi
diskCacheSize: 3000Gi
config: |
[cn.Engine]
type = "distributed-tae"
[log]
level = "info"
format = "json"
max-size = 512
[cn]
turn-on-push-model = true
[cn.txn]
enable-sacrificing-freshness = 1
enable-cn-based-consistency = 0
enable-leak-check = 1
max-active-ages = "20m"
[observability]
metricUpdateStorageUsageInterval = "15m"
enableStmtMerge = true
enableMetricToProm = true
[cn.txn.trace]
load-to-s3 = true
flush-bytes = "256MB"
force-flush-duration = "300s"
[cn.rpc]
max-message-size = "1000M"
replicas: 3
resources:
requests:
cpu: 14
memory: 55Gi
limits:
cpu: 14
memory: 55Gi
proxy:
replicas: 2
nodeSelector:
tke.matrixorigin.io/mo-nightly-regression-proxy: "true"
overlay:
initContainers:
- image: ccr.ccs.tencentyun.com/matrixone-dev/matrixone:nightly-ff4db5805
command:
- sh
- -c
- |
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_fin_timeout=30
imagePullPolicy: Always
name: setsysctl
terminationMessagePolicy: File
securityContext:
capabilities:
add: ["NET_ADMIN","SYS_ADMIN"]
podAnnotations:
profiles.grafana.com/memory.scrape: "true"
profiles.grafana.com/memory.port: "6060"
profiles.grafana.com/cpu.scrape: "true"
profiles.grafana.com/cpu.port: "6060"
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/mo-nightly-regression-proxy
operator: Exists
imagePullSecrets:
- name: tke-registry
resources:
# requests are the requested resources, this will also be used to schedule the LogService Pod
requests:
cpu: 3
memory: 6Gi
# limits are the resource limitation of the Pod
limits:
cpu: 3
memory: 6Gi
config: |
# TOML format config file below
[log]
level="info"
[proxy]
conn-cache-enabled = true
version: nightly-ff4db5805
问题修复后,还是需要进行大量的测试,2.0.1计划不打开该能力,DEALY到下个版本进行测试
in process
Is there an existing issue for the same bug?
Branch Name
2.0-dev
Commit ID
3ecc49ab9
Other Environment Information
Actual Behavior
测试场景: tke proxy配置conn-cache-enabled = true ,测试工具配置为短连接,并发100点查测试一会儿后,客户端无法登陆mo
tke yaml proxy配置:
mo-load测试工具配置为短连接:(具体使用见复现步骤)
mo log: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22pGq%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-branch-nightly-2e5ddb165-20241109%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221731404462204%22,%22to%22:%221731411550749%22%7D%7D%7D&schemaVersion=1&orgId=1
Expected Behavior
No response
Steps to Reproduce
Additional information
No response