internetarchive / Zeno

State-of-the-art web crawler 🔱
GNU Affero General Public License v3.0
83 stars 11 forks source link

Panic on /workers access #132

Closed CorentinB closed 3 months ago

CorentinB commented 3 months ago
2024/08/15 07:49:36 http: panic serving 127.0.0.1:45290: runtime error: invalid memory address or nil pointer dereference
goroutine 212832422 [running]:
net/http.(*conn).serve.func1()
        /var/www/.go/src/net/http/server.go:1903 +0xbe
panic({0x1371ac0?, 0x21a7f70?})
equals215 commented 3 months ago

how to reproduce?

CorentinB commented 3 months ago

how to reproduce?

No idea.

yzqzss commented 3 months ago

Reproduce:

rm jobs/ -rf && Zeno get url https://example.com/{1..100} --api --debug -w 2

After the workers start working, open another terminal, and make many requests to /workers.

while true; do curl http://localhost:9443/workers --silent > /dev/null; done

2024/08/22 17:10:50 http: panic serving [::1]:54390: runtime error: invalid memory address or nil pointer dereference
goroutine 980 [running]:
net/http.(*conn).serve.func1()
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:1947 +0xbe
panic({0x13bd280?, 0x2249670?})
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/runtime/panic.go:785 +0x132
github.com/internetarchive/Zeno/internal/pkg/crawl._getWorkerState(0xc00033ea40)
        /home/yzqzss/git/Zeno/internal/pkg/crawl/worker_pool.go:137 +0x274
github.com/internetarchive/Zeno/internal/pkg/crawl.(*WorkerPool).GetWorkerStateFromPool.func1({0xc00040d968?, 0xc00041b920?}, {0x1461380?, 0xc00033ea40?})
        /home/yzqzss/git/Zeno/internal/pkg/crawl/worker_pool.go:109 +0x3d
sync.(*Map).Range(0x0?, 0xc00040d9f8)
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/sync/map.go:501 +0x1f8
github.com/internetarchive/Zeno/internal/pkg/crawl.(*WorkerPool).GetWorkerStateFromPool(0xc000324b90, {0x0?, 0x1544d3d?})
        /home/yzqzss/git/Zeno/internal/pkg/crawl/worker_pool.go:108 +0xb3
github.com/internetarchive/Zeno/internal/pkg/crawl.(*Crawl).startAPI.func3({0x170a988, 0xc0005b6000}, 0xc000412b38?)
        /home/yzqzss/git/Zeno/internal/pkg/crawl/api.go:65 +0xe9
net/http.HandlerFunc.ServeHTTP(0x2293be0?, {0x170a988?, 0xc0005b6000?}, 0x7516b6?)
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:2220 +0x29
net/http.(*ServeMux).ServeHTTP(0x46fa99?, {0x170a988, 0xc0005b6000}, 0xc0001e4000)
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:2747 +0x1ca
net/http.serverHandler.ServeHTTP({0xc0007b4090?}, {0x170a988?, 0xc0005b6000?}, 0x6?)
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:3210 +0x8e
net/http.(*conn).serve(0xc00075e000, {0x170cb38, 0xc0002240c0})
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:2092 +0x5d0
created by net/http.(*Server).Serve in goroutine 14
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:3360 +0x485
CorentinB commented 3 months ago

Reproduce:

rm jobs/ -rf && Zeno get url https://example.com/{1..100} --api --debug -w 2

After the workers start working, open another terminal, and make many requests to /workers.

while true; do curl http://localhost:9443/workers --silent > /dev/null; done
2024/08/22 17:10:50 http: panic serving [::1]:54390: runtime error: invalid memory address or nil pointer dereference
goroutine 980 [running]:
net/http.(*conn).serve.func1()
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:1947 +0xbe
panic({0x13bd280?, 0x2249670?})
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/runtime/panic.go:785 +0x132
github.com/internetarchive/Zeno/internal/pkg/crawl._getWorkerState(0xc00033ea40)
        /home/yzqzss/git/Zeno/internal/pkg/crawl/worker_pool.go:137 +0x274
github.com/internetarchive/Zeno/internal/pkg/crawl.(*WorkerPool).GetWorkerStateFromPool.func1({0xc00040d968?, 0xc00041b920?}, {0x1461380?, 0xc00033ea40?})
        /home/yzqzss/git/Zeno/internal/pkg/crawl/worker_pool.go:109 +0x3d
sync.(*Map).Range(0x0?, 0xc00040d9f8)
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/sync/map.go:501 +0x1f8
github.com/internetarchive/Zeno/internal/pkg/crawl.(*WorkerPool).GetWorkerStateFromPool(0xc000324b90, {0x0?, 0x1544d3d?})
        /home/yzqzss/git/Zeno/internal/pkg/crawl/worker_pool.go:108 +0xb3
github.com/internetarchive/Zeno/internal/pkg/crawl.(*Crawl).startAPI.func3({0x170a988, 0xc0005b6000}, 0xc000412b38?)
        /home/yzqzss/git/Zeno/internal/pkg/crawl/api.go:65 +0xe9
net/http.HandlerFunc.ServeHTTP(0x2293be0?, {0x170a988?, 0xc0005b6000?}, 0x7516b6?)
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:2220 +0x29
net/http.(*ServeMux).ServeHTTP(0x46fa99?, {0x170a988, 0xc0005b6000}, 0xc0001e4000)
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:2747 +0x1ca
net/http.serverHandler.ServeHTTP({0xc0007b4090?}, {0x170a988?, 0xc0005b6000?}, 0x6?)
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:3210 +0x8e
net/http.(*conn).serve(0xc00075e000, {0x170cb38, 0xc0002240c0})
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:2092 +0x5d0
created by net/http.(*Server).Serve in goroutine 14
        /home/yzqzss/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/net/http/server.go:3360 +0x485

Haha so funny that you just posted that, we found how to reproduce it just now, basically it's when URL in URL: utils.URLToString(worker.state.currentItem.URL) is nil. Happens when the worker is doing nothing, of course.

CorentinB commented 3 months ago

Fix: https://github.com/internetarchive/Zeno/pull/140