m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Fix local development mode to accept paris-traceroute archives #1025

Open cristinaleonr opened 2 years ago

cristinaleonr commented 2 years ago

Currently, etl_worker crashes in local development mode when a paris-traceroute archive is supplied as a URL.

Steps to reproduce:

  1. Navigate to cmd/etl_worker within the ETL project.
  2. Run go run ./etl_worker.go -service_port :8080 -output_dir ./output -output local.
  3. Open up another terminal and set the URL variable to some paris-traceroute archive (e.g., URL=gs://archive-measurement-lab/paris-traceroute/2019/11/19/20191119T000000Z-mlab1-ord03-paris-traceroute-0000.tgz).
  4. Run curl "http://localhost:8081/v2/worker?filename=$URL"
  5. The etl_worker crashes with:
    2021/10/11 18:48:31 worker.go:174: <nil> creating parser for traceroute gs://archive-measurement-lab/paris-traceroute/2013/05/08/20130508T000000Z-mlab3-akl01-paris-traceroute-0000.tgz
    2021/10/11 18:48:31 server.go:3159: http: panic serving [::1]:56948: runtime error: invalid memory address or nil pointer dereference
    goroutine 135 [running]:
    net/http.(*conn).serve.func1()
        /usr/local/go/src/net/http/server.go:1801 +0xb9
    panic({0xd47080, 0x151a520})
        /usr/local/go/src/runtime/panic.go:1047 +0x266
    github.com/m-lab/etl/task.(*Task).Close(0x0)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/task/task.go:67 +0x19
    panic({0xd47080, 0x151a520})
        /usr/local/go/src/runtime/panic.go:1038 +0x215
    github.com/m-lab/etl/task.(*Task).ProcessAllTests(0x4, 0x50)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/task/task.go:85 +0x4f
    github.com/m-lab/etl/worker.DoGKETask(_, {{0xc0002ee150, 0x6f}, {0xc0002ee16d, 0x52}, {0xc0002ee155, 0x17}, {0x0, 0x0}, {0xc0002ee16d, ...}, ...})
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/worker/worker.go:209 +0x30
    github.com/m-lab/etl/worker.ProcessGKETask({_, _}, {{0xc0002ee150, 0x6f}, {0xc0002ee16d, 0x52}, {0xc0002ee155, 0x17}, {0x0, 0x0}, ...}, ...)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/worker/worker.go:204 +0x4de
    main.(*runnable).Run(0xc00000c3c0, {0xfb61e8, 0xc00019a000})
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/cmd/etl_worker/etl_worker.go:313 +0x2f6
    main.handleLocalRequest({0xfb1500, 0xc0001f5ea0}, 0x0)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/cmd/etl_worker/etl_worker.go:196 +0x189
    net/http.HandlerFunc.ServeHTTP(0x0, {0xfb1500, 0xc0001f5ea0}, 0x0)
        /usr/local/go/src/net/http/server.go:2046 +0x2f
    net/http.(*ServeMux).ServeHTTP(0xc00022400f, {0xfb1500, 0xc0001f5ea0}, 0xc000432200)
        /usr/local/go/src/net/http/server.go:2424 +0x149
    net/http.serverHandler.ServeHTTP({0xc0005bdb90}, {0xfb1500, 0xc0001f5ea0}, 0xc000432200)
        /usr/local/go/src/net/http/server.go:2878 +0x43b
    net/http.(*conn).serve(0xc00045c000, {0xfb6258, 0xc0001335c0})
        /usr/local/go/src/net/http/server.go:1929 +0xb08
    created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:3033 +0x4e8

Note: this does not happen with other datatypes (e.g., PCAP, hopannotation1, scamper1).

stephen-soltesz commented 2 years ago

Ah, so there are currently two processing paths in the etl_worker: one for the "v1" system, and another for the "v2" system.

So, try the same GCS URL with the resource path /worker instead.

stephen-soltesz commented 2 years ago

And, I think the next issue will be that the v1 system does not support local output.

cristinaleonr commented 2 years ago

Thanks for clarifying!

I tried with the /worker path. I think you're right about the v1 system not supporting local output, because now the output is this error: 2021/10/12 13:57:26 insert.go:299: InsertErr googleapi: Error 400: The destination table is invalid: projec_id , dataset_id base_tables, table_id: traceroute., invalid on traceroute_20191119