shirou / gopsutil

psutil for golang
Other
10.28k stars 1.56k forks source link

Crash in createTimeWithContext when run on FreeBSD arm64 #1636

Open sdalu opened 2 months ago

sdalu commented 2 months ago

Describe the bug When running telegraph 1.31.0~553d972c, with gopsutil v3.24.4, it crash in the createTimeWithContext.

2024-05-02T13:09:40Z E! FATAL: [inputs.procstat] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 147 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x4d410370)
    /go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x74
panic({0x67aa400, 0xc587b20})
    /usr/local/go/src/runtime/panic.go:770 +0xfc
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x4d0a0368, {0x8232a44, 0xc9983c0})
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.24.4/process/process_freebsd.go:121 +0x4c
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x4d0a0368, {0x8232a44, 0xc9983c0})
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.24.4/process/process.go:310 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x8232a44, 0xc9983c0}, 0x3744)
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.24.4/process/process.go:218 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(...)
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.24.4/process/process.go:203
github.com/influxdata/telegraf/plugins/inputs/procstat.newProc(0x3744)
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:38 +0x30
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).gatherOld(0x4ccc6e48, {0x824a858, 0x4d40cae0})
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:209 +0x848
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x4ccc6e48, {0x824a858, 0x4d40cae0})
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:166 +0x38
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x4d410370, {0x824a858, 0x4d40cae0})
    /go/src/github.com/influxdata/telegraf/models/running_input.go:227 +0x2c4
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
    /go/src/github.com/influxdata/telegraf/agent/agent.go:583 +0x70
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 120
    /go/src/github.com/influxdata/telegraf/agent/agent.go:581 +0xc0

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x4d310b68)
    /usr/local/go/src/runtime/sema.go:62 +0x3c
sync.(*WaitGroup).Wait(0
2024-05-02T13:09:40Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose

Issue initially reported: https://github.com/influxdata/telegraf/issues/13933

Environment (please complete the following information): FreBSD:

14.0-RELEASE-p4 14.0-RELEASE-p4 14.0-RELEASE-p4

FreeBSD brain.home.sdalu.com 14.0-RELEASE-p4 FreeBSD 14.0-RELEASE-p4 #0 releng/14.0-n265400-4edf3b80733e: Wed Jan 17 19:32:19 CET 2024 root@brain.home.sdalu.com:/usr/obj/usr/src/arm64.aarch64/sys/BROADCOM arm64

shirou commented 2 months ago

I can pass go122 test -run Test_Process_CreateTime on my QEMU environment. Any other information or could you provide a minimal reproducible code?

# go122 version
go version go1.22.1 freebsd/arm64

# uname -a
FreeBSD generic 14.0-RELEASE FreeBSD 14.0-RELEASE #0 releng/14.0-n265380-f9716eee8ab4: Fri Nov 10 08:59:18 UTC 2023     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64
sdalu commented 2 months ago

I'm just testing with the artifact freebsd_armv7.tar.gz provided https://github.com/influxdata/telegraf/pull/15272 . Hoping that @powersj could tell you more about the build. The computer on which I tested is a Raspberry PI 4.

I'm not a go user, how do I run go test -run Test_Process_CreateTime ?

I tried:

git clone  https://github.com/shirou/gopsutil
cd gopsutil
go test -run Test_Process_CreateTime

#  result I got:
?       github.com/shirou/gopsutil/v3   [no test files]
powersj commented 1 month ago

@sdalu,

Can you run go test -v -run Test_Process_CreateTime ./process/...

@shirou,

In the original issue, I had the reporter use the following code. You can see his response for the output.

sdalu commented 1 month ago
root@brain:~/gopsutil # go test -v -run Test_Process_CreateTime ./process/...
go: downloading golang.org/x/sys v0.20.0
go: downloading github.com/stretchr/testify v1.9.0
=== RUN   Test_Process_CreateTime
--- PASS: Test_Process_CreateTime (0.00s)
PASS
ok      github.com/shirou/gopsutil/v3/process   0.016s
shirou commented 1 month ago

Thank you for the information. This means that, at least in the test environment, the CreateTime() is being retrieved without any issues. In other words, it is not a simple implementation mistake in the code or a problem with the struct definition. Do you have any other potential issues in mind?

powersj commented 1 month ago

@shirou,

Well gopsutil should not panic :)

In the original issue, I had sdalu, first get a process using the scripts own PID:

myself, err := process.NewProcess(int32(currentPid))

This returned nothing, yet the test does.

@sdalu,

Can you run the following go code:

go.mod:

module test-process

go 1.22

require github.com/shirou/gopsutil/v3 v3.24.4

require (
    github.com/go-ole/go-ole v1.2.6 // indirect
    github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
    github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
    github.com/shoenig/go-m1cpu v0.1.6 // indirect
    github.com/tklauser/go-sysconf v0.3.14 // indirect
    github.com/tklauser/numcpus v0.8.0 // indirect
    github.com/yusufpapurcu/wmi v1.2.4 // indirect
    golang.org/x/sys v0.20.0 // indirect
)

main.go:

package main

import (
    "fmt"
    "os"

    "github.com/shirou/gopsutil/v3/process"
)

func main() {
    currentPid := os.Getpid()
    myself, err := process.NewProcess(int32(currentPid))
    if err != nil {
        panic(err)
    }
    fmt.Println(myself.Name())
    fmt.Println(myself.String())
    fmt.Println(myself.NumThreads())
    fmt.Println(myself.RlimitUsage(true))
    fmt.Println(myself.Status())

    time, err := myself.CreateTime()
    if err != nil {
        panic(err)
    }
    fmt.Println(time)
}

And either run this directly via go run . or build it go build . and run the test-process binary.

sdalu commented 1 month ago

I'm using go 1.21

root@brain:~/t # go run .
test-process <nil>
{"pid":38018}
6 <nil>
[] not implemented yet
[running] <nil>
1715697644691
powersj commented 1 month ago

Well that is not what I was expecting. Just to confirm, this was run on your FreeBSD system that was showing the issue with Telegraf?

sdalu commented 1 month ago

Yes. I re-run the telegraf artifact, just in case. It's crashing with the same panic. Should I try to build it instead of runing the artifact ?

powersj commented 1 month ago

@sdalu I have put up https://github.com/influxdata/telegraf/pull/15355 which will hopefully print out what the result of getKProc call and the pid we are looking at. In 20-30mins can you download another Telegraf artifact and see what you get?

Thanks!

sdalu commented 1 month ago

@powersj Here is the output:

root@brain:~/t/telegraf-1.31.0 # ./usr/bin/telegraf --config /usr/local/etc/telegraf.conf --debug
2024-05-14T20:07:16Z I! Loading config: /usr/local/etc/telegraf.conf
2024-05-14T20:07:16Z W! DeprecationWarning: Option "fielddrop" of plugin "inputs.system" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldexclude' instead
2024-05-14T20:07:16Z W! DeprecationWarning: Option "dns_lookup" of plugin "inputs.ntpq" deprecated since version 1.24.0 and will be removed in 2.0.0: add '-n' to 'options' instead to skip DNS lookup
2024-05-14T20:07:16Z I! Starting Telegraf 1.31.0-74404b62 brought to you by InfluxData the makers of InfluxDB
2024-05-14T20:07:16Z I! Available plugins: 233 inputs, 9 aggregators, 32 processors, 25 parsers, 60 outputs, 4 secret-stores
2024-05-14T20:07:16Z I! Loaded inputs: apache cpu disk diskio exec (2x) internal mem mysql net netstat ntpq processes procstat (10x) redis smart swap system zfs
2024-05-14T20:07:16Z I! Loaded aggregators: 
2024-05-14T20:07:16Z I! Loaded processors: converter override (4x) regex
2024-05-14T20:07:16Z I! Loaded secretstores: 
2024-05-14T20:07:16Z I! Loaded outputs: influxdb_v2
2024-05-14T20:07:16Z I! Tags enabled: host=brain.home.sdalu.com
2024-05-14T20:07:16Z W! Deprecated inputs: 0 and 1 options
2024-05-14T20:07:16Z I! [agent] Config: Interval:20s, Quiet:false, Hostname:"brain.home.sdalu.com", Flush Interval:30s
2024-05-14T20:07:16Z D! [agent] Initializing plugins
2024-05-14T20:07:16Z W! [inputs.smart] nvme not found: verify that nvme is installed and it is in your PATH (or specified in config) to gather vendor specific attributes: provided path does not exist: []
2024-05-14T20:07:16Z W! DeprecationWarning: Value "false" for option "ignore_protocol_stats" of plugin "inputs.net" deprecated since version 1.27.3 and will be removed in 1.36.0: use the 'inputs.nstat' plugin instead
2024-05-14T20:07:16Z D! [agent] Connecting outputs
2024-05-14T20:07:16Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2024-05-14T20:07:16Z D! [agent] Successfully connected to outputs.influxdb_v2
2024-05-14T20:07:16Z D! [agent] Starting service inputs
getting process with pid:  40544
k: (*process.KinfoProc)(nil)
2024-05-14T20:07:20Z E! FATAL: [inputs.procstat] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 149 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x4d638140)
    /go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x74
panic({0x696a390, 0xc88a150})
    /usr/local/go/src/runtime/panic.go:770 +0xfc
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x4ce30098, {0x843a4c4, 0xcc9ccd8})
    /go/pkg/mod/github.com/powersj/gopsutil/v3@v3.0.0-20240514151642-919283ff5f48/process/process_freebsd.go:123 +0xac
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x4ce30098, {0x843a4c4, 0xcc9ccd8})
    /go/pkg/mod/github.com/powersj/gopsutil/v3@v3.0.0-20240514151642-919283ff5f48/process/process.go:312 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x843a4c4, 0xcc9ccd8}, 0x9e60)
    /go/pkg/mod/github.com/powersj/gopsutil/v3@v3.0.0-20240514151642-919283ff5f48/process/process.go:220 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(0x9e60)
    /go/pkg/mod/github.com/powersj/gopsutil/v3@v3.0.0-20240514151642-919283ff5f48/process/process.go:205 +0x90
github.com/influxdata/telegraf/plugins/inputs/procstat.newProc(0x9e60)
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:38 +0x1c
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).gatherOld(0x4d36cf78, {0x8452d48, 0x4d367308})
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:223 +0x810
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x4d36cf78, {0x8452d48, 0x4d367308})
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:185 +0x38
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x4d638140, {0x8452d48, 0x4d367308})
    /go/src/github.com/influxdata/telegraf/models/running_input.go:228 +0x2c4
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
    /go/src/github.com/influxdata/telegraf/agent/agent.go:583 +0x70
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 146
    /go/src/github.com/influxdata/telegraf/agent/agent.go:581 +0xc0

goroutine
2024-05-14T20:07:20Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose
powersj commented 1 month ago

Thanks for trying that out!

getting process with pid: 40544 k: (*process.KinfoProc)(nil)

@shirou any reason why the kinfoproc would be nil? Does that mean gopsutil failed to parse the information but didn't generate an error?

That print statement is from my own fork, printing out the k value before we attempt to create time with context:

https://github.com/powersj/gopsutil/blob/919283ff5f489d3697e01c5267ccdd0a9273cd41/process/process_freebsd.go#L122

shirou commented 1 month ago

I read the code and it doesn't seem like err is nil and kInfoProc is nil at the same time. However, this issue is occurring in practice. Therefore, I have created a PR #1653 to add a nil check. Could you please take a look?

powersj commented 1 month ago

@sdalu,

As before, can you grab the telegraf artifacts in few minutes from this post at: https://github.com/influxdata/telegraf/pull/15397

@shirou - thanks for looking into this!

sdalu commented 1 month ago

@powersj here is the result (looks the same):

root@brain:~/telegraf-1.31.0 # ./usr/bin/telegraf --config /usr/local/etc/telegraf.conf --debug
2024-05-23T19:41:13Z I! Loading config: /usr/local/etc/telegraf.conf
2024-05-23T19:41:13Z W! DeprecationWarning: Option "fielddrop" of plugin "inputs.system" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldexclude' instead
2024-05-23T19:41:13Z W! DeprecationWarning: Option "dns_lookup" of plugin "inputs.ntpq" deprecated since version 1.24.0 and will be removed in 2.0.0: add '-n' to 'options' instead to skip DNS lookup
2024-05-23T19:41:13Z I! Starting Telegraf 1.31.0-ef79d2b9 brought to you by InfluxData the makers of InfluxDB
2024-05-23T19:41:13Z I! Available plugins: 233 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 4 secret-stores
2024-05-23T19:41:13Z I! Loaded inputs: apache cpu disk diskio exec (2x) internal mem mysql net netstat ntpq processes procstat (10x) redis smart swap system zfs
2024-05-23T19:41:13Z I! Loaded aggregators: 
2024-05-23T19:41:13Z I! Loaded processors: converter override (4x) regex
2024-05-23T19:41:13Z I! Loaded secretstores: 
2024-05-23T19:41:13Z I! Loaded outputs: influxdb_v2
2024-05-23T19:41:13Z I! Tags enabled: host=brain.home.sdalu.com
2024-05-23T19:41:13Z W! Deprecated inputs: 0 and 1 options
2024-05-23T19:41:13Z I! [agent] Config: Interval:20s, Quiet:false, Hostname:"brain.home.sdalu.com", Flush Interval:30s
2024-05-23T19:41:13Z D! [agent] Initializing plugins
2024-05-23T19:41:13Z W! [inputs.smart] nvme not found: verify that nvme is installed and it is in your PATH (or specified in config) to gather vendor specific attributes: provided path does not exist: []
2024-05-23T19:41:13Z W! DeprecationWarning: Value "false" for option "ignore_protocol_stats" of plugin "inputs.net" deprecated since version 1.27.3 and will be removed in 1.36.0: use the 'inputs.nstat' plugin instead for protocol stats
2024-05-23T19:41:13Z D! [agent] Connecting outputs
2024-05-23T19:41:13Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2024-05-23T19:41:13Z D! [agent] Successfully connected to outputs.influxdb_v2
2024-05-23T19:41:13Z D! [agent] Starting service inputs
2024-05-23T19:41:20Z E! FATAL: [inputs.procstat] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 61 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x4d36c5a0)
    /go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x74
panic({0x68b6d10, 0xc708b80})
    /usr/local/go/src/runtime/panic.go:770 +0xfc
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x4d0cc128, {0x8361124, 0xcb1af28})
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.24.5-0.20240523132159-edb58e3dd10d/process/process_freebsd.go:122 +0x4c
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x4d0cc128, {0x8361124, 0xcb1af28})
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.24.5-0.20240523132159-edb58e3dd10d/process/process.go:310 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x8361124, 0xcb1af28}, 0x624)
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.24.5-0.20240523132159-edb58e3dd10d/process/process.go:218 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(...)
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.24.5-0.20240523132159-edb58e3dd10d/process/process.go:203
github.com/influxdata/telegraf/plugins/inputs/procstat.newProc(0x624)
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:38 +0x30
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).gatherOld(0x4d29e8f8, {0x8379320, 0x4d3951e8})
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:223 +0x810
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x4d29e8f8, {0x8379320, 0x4d3951e8})
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:185 +0x38
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x4d36c5a0, {0x8379320, 0x4d3951e8})
    /go/src/github.com/influxdata/telegraf/models/running_input.go:228 +0x2c4
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
    /go/src/github.com/influxdata/telegraf/agent/agent.go:583 +0x70
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 112
    /go/src/github.com/influxdata/telegraf/agent/agent.go:581 +0xc0

goroutine 1 [
2024-05-23T19:41:20Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose

I downloaded: telegraf-1.31.0~ef79d2b9_freebsd_armv7.tar.gz

powersj commented 1 month ago

That's the same line, you originally saw:

return int64(k.Start.Sec)*1000 + int64(k.Start.Usec)/1000, nil

Ah, @shirou, the nil check is checking if the memory address is nil, so that will never be nil.

- &x == nil
+ x == nil
shirou commented 1 month ago

The parseKinfoProc function returns a struct instance, not a pointer. Therefore, I agree that it should be impossible for a pointer to this value to be nil. However, it is actually happening, so I am very confused.

powersj commented 1 month ago

@shirou - can that be changed? Otherwise you would need to check if any of the pointer fields are nil?