influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.93k stars 3.55k forks source link

Task: retry runs is not working #10768

Closed bednar closed 5 years ago

bednar commented 5 years ago

I have an issue with retrying the run of tasks. When I trying retry the Run by HTTP endpoint: "/api/v2/tasks/0335cec6ce1ab000/runs/0335cec832dab000/retry". The response is not successfully finished and contains the X-Influx-Error header with error:

panic: column _measurement:string is not of type time goroutine 18496 [running]: runtime/debug.Stack(0x0, 0x0, 0x0)     /usr/local/Cellar/go/1.11/libexec/src/runtime/debug/stack.go:24 +0xb1 github.com/influxdata/flux/execute.(*poolDispatcher).Start.func1.1(0x

This is caused by FindRunByID() function that will throw error with a stacktrace:

panic: column _measurement:string is not of type time
goroutine 31390 [running]:
runtime/debug.Stack(0x0, 0x0, 0x0)
    /usr/local/Cellar/go/1.11/libexec/src/runtime/debug/stack.go:24 +0xb1
github.com/influxdata/flux/execute.(*poolDispatcher).Start.func1.1(0xc000bf2240)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/dispatcher.go:73 +0x12f
panic(0x24802c0, 0xc00050cc70)
    /usr/local/Cellar/go/1.11/libexec/src/runtime/panic.go:513 +0x1e6
github.com/influxdata/flux/execute.CheckColType(0xc0004c26a0, 0xc, 0x5, 0x6)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/table.go:1138 +0x26e
github.com/influxdata/flux/execute.(*ColListTableBuilder).checkCol(0xc000fce320, 0x0, 0x6, 0x0, 0x0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/table.go:1132 +0x1d9
github.com/influxdata/flux/execute.(*ColListTableBuilder).AppendTime(0xc000fce320, 0x0, 0x1577816671a6b608, 0x0, 0x0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/table.go:1061 +0x61
github.com/influxdata/flux/execute.(*ColListTableBuilder).AppendValue(0xc000fce320, 0x0, 0x296a540, 0xc000ab03a0, 0x0, 0x0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/table.go:1122 +0x4ea
github.com/influxdata/flux/functions/transformations.(*MergeJoinCache).join.func1(0xc000302300, 0x7, 0x296a540, 0xc000ab03a0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/functions/transformations/join.go:862 +0x1c7
github.com/influxdata/flux/values.(*object).Range(0xc000ab0220, 0xc000ab0660)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/values/object.go:96 +0x101
github.com/influxdata/flux/functions/transformations.(*MergeJoinCache).join(0xc0005c63c0, 0xc0010765f0, 0xc0009826e0, 0x0, 0x0, 0x0, 0x0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/functions/transformations/join.go:855 +0xa4f
github.com/influxdata/flux/functions/transformations.(*MergeJoinCache).ForEachWithContext.func1(0x296a900, 0xc00093a0c0, 0x247c480, 0x356e570)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/functions/transformations/join.go:603 +0x379
github.com/influxdata/flux/execute.(*GroupLookup).Range(0xc000753600, 0xc000d07ab0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/group_lookup.go:83 +0xbf
github.com/influxdata/flux/functions/transformations.(*MergeJoinCache).ForEachWithContext(0xc0005c63c0, 0xc0004e4880)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/functions/transformations/join.go:591 +0xd8
github.com/influxdata/flux/execute.(*dataset).evalTriggers(0xc0009d6460, 0xc0005981a0, 0x2473240)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/dataset.go:113 +0xc6
github.com/influxdata/flux/execute.(*dataset).UpdateWatermark(0xc0009d6460, 0x1577816833aa0cb8, 0x0, 0x0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/dataset.go:88 +0x61
github.com/influxdata/flux/functions/transformations.(*mergeJoinTransformation).UpdateWatermark(0xc00096c800, 0x9053b552b93851a3, 0xeac8c8315eb2cea0, 0x1577816833aa0cb8, 0x0, 0x0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/functions/transformations/join.go:307 +0x1b6
github.com/influxdata/flux/execute.processMessage(0x295a6e0, 0xc00096c800, 0x29472e0, 0xc000741240, 0xc000a00200, 0x0, 0x0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/transport.go:202 +0x3d1
github.com/influxdata/flux/execute.(*consecutiveTransport).processMessages(0xc000bf28a0, 0xa)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/transport.go:156 +0xcb
github.com/influxdata/flux/execute.(*consecutiveTransport).processMessages-fm(0xa)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/transport.go:137 +0x39
github.com/influxdata/flux/execute.(*poolDispatcher).run(0xc000bf2240, 0x29562a0, 0xc0004cc3c0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/dispatcher.go:126 +0x1b5
github.com/influxdata/flux/execute.(*poolDispatcher).Start.func1(0xc000bf2240, 0x29562a0, 0xc0004cc3c0)
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/dispatcher.go:80 +0x99
created by github.com/influxdata/flux/execute.(*poolDispatcher).Start
    /Users/bednar/go/pkg/mod/github.com/influxdata/flux@v0.12.0/execute/dispatcher.go:61 +0x8e

@mark-rushakoff please, can you look at this problem?

mark-rushakoff commented 5 years ago

Thanks for the issue @bednar. The team will look into it this week.

mark-rushakoff commented 5 years ago

@bednar I haven't been able to reproduce this. My best guess at the moment is that maybe you have data from an old task run that had a different schema.

Are you able to reproduce the error if you start with a clean database? Please also share which commit of influxdb you're using.

bednar commented 5 years ago

@mark-rushakoff

I tried it over the latest docker build:

docker pull quay.io/influxdb/influx:latest
docker run --detach --publish 9999:9999 quay.io/influxdb/influx:latest

and also over the locally builded instance from commit 44aa7f39c... with clean database (rm -rf ~/.influxdbv2/).

Steps to reproduce:

I have the test that:

1/ Creates a new Task:

POST -> /api/v2/tasks 

{
    "every": "1s",
    "flux": "option task = {name: \"it task1547629541798\", every: 1s} \n from(bucket:\"my-bucket\") |> range(start: 0) |> last()",
    "links": {},
    "name": "it task1547629541798",
    "orgID": "03417949f82ee000",
    "owner": {
        "id": "03417949e52ee000",
        "links": {
            "log": "/api/v2/users/03417949e52ee000/log",
            "self": "/api/v2/users/03417949e52ee000"
        },
        "name": "my-user"
    },
    "status": "active"
}

2/ Retrieves list of run records for a task

GET -> /api/v2/tasks/03417987731d2000/runs?orgID=03417949f82ee000

{
    "links": {
        "self": "/api/v2/tasks/03417987731d2000/runs",
        "task": "/api/v2/tasks/03417987731d2000"
    },
    "runs": [{
        "links": {
            "logs": "/api/v2/tasks/03417987731d2000/runs/03417988a59d2000/logs",
            "retry": "/api/v2/tasks/03417987731d2000/runs/03417988a59d2000/retry",
            "self": "/api/v2/tasks/03417987731d2000/runs/03417988a59d2000",
            "task": "/api/v2/tasks/03417987731d2000"
        },
        "id": "03417988a59d2000",
        "taskID": "03417987731d2000",
        "status": "success",
        "scheduledFor": "2019-01-16T09:05:43Z",
        "startedAt": "2019-01-16T09:05:43.070709399Z",
        "finishedAt": "2019-01-16T09:05:43.079548236Z",
        "log": ""
    }, {
        "links": {
            "logs": "/api/v2/tasks/03417987731d2000/runs/03417989a05d2000/logs",
            "retry": "/api/v2/tasks/03417987731d2000/runs/03417989a05d2000/retry",
            "self": "/api/v2/tasks/03417987731d2000/runs/03417989a05d2000",
            "task": "/api/v2/tasks/03417987731d2000"
        },
        "id": "03417989a05d2000",
        "taskID": "03417987731d2000",
        "status": "success",
        "scheduledFor": "2019-01-16T09:05:44Z",
        "startedAt": "2019-01-16T09:05:44.074713719Z",
        "finishedAt": "2019-01-16T09:05:44.084738742Z",
        "log": ""
    }, {
        "links": {
            "logs": "/api/v2/tasks/03417987731d2000/runs/0341798a995d2000/logs",
            "retry": "/api/v2/tasks/03417987731d2000/runs/0341798a995d2000/retry",
            "self": "/api/v2/tasks/03417987731d2000/runs/0341798a995d2000",
            "task": "/api/v2/tasks/03417987731d2000"
        },
        "id": "0341798a995d2000",
        "taskID": "03417987731d2000",
        "status": "success",
        "scheduledFor": "2019-01-16T09:05:45Z",
        "startedAt": "2019-01-16T09:05:45.076413023Z",
        "finishedAt": "2019-01-16T09:05:45.082437325Z",
        "log": ""
    }, {
        "links": {
            "logs": "/api/v2/tasks/03417987731d2000/runs/0341798b935d2000/logs",
            "retry": "/api/v2/tasks/03417987731d2000/runs/0341798b935d2000/retry",
            "self": "/api/v2/tasks/03417987731d2000/runs/0341798b935d2000",
            "task": "/api/v2/tasks/03417987731d2000"
        },
        "id": "0341798b935d2000",
        "taskID": "03417987731d2000",
        "status": "success",
        "scheduledFor": "2019-01-16T09:05:46Z",
        "startedAt": "2019-01-16T09:05:46.071084524Z",
        "finishedAt": "2019-01-16T09:05:46.078989075Z",
        "log": ""
    }]
}

3/ Tries retry the first run

POST -> /api/v2/tasks/03417987731d2000/runs/03417988a59d2000/retry

empty_body

When I try it at first time everything is ok, but after a second attempt the error occurs:

X-Influx-Error: panic: column _measurement:string is not of type time goroutine 3404 [running]: runtime/debug.Stack(0x1392160, 0x135b0c0, 0xc000143ea0)     /usr/local/go/src/runtime/debug/stack.go:24 +0xa7 github.com/influxdata/flux/execute.(*poolDispatcher).Start.func1.1(0x
mark-rushakoff commented 5 years ago

I'm still not able to reproduce that panic.

Here's my shell log from that same commit 44aa7f39c2a0141efbc2e9a03f3bdd781c18004d:

$ ./influx task create --org a 'option task = { name: "mytask", every: 1m } from(bucket: "a") |> range(start: 0) |> last()'
ID          Name    Organization        Status  Every   Cron
03422b29cdf97000        03422b1f4d1ae000    active  1m0s

$ ./influx task run find --org-id 03422b1f4d1ae000 --task-id 03422b29cdf97000
ID          TaskID          Status  ScheduledFor        StartedAt           FinishedAt  RequestedAt
03422b3604b97000    03422b29cdf97000    success 2019-01-16T22:02:00Z    2019-01-16T22:02:00.091361Z 2019-01-16T22:02:00.108628Z

# First retry succeeds
$  curl -v -XPOST  -H "Authorization: Token $(cat ~/.influxdbv2/credentials)" localhost:9999/api/v2/tasks/03422b29cdf97000/runs/03422b3604b97000/retry
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 9999 (#0)
> POST /api/v2/tasks/03422b29cdf97000/runs/03422b3604b97000/retry HTTP/1.1
> Host: localhost:9999
> User-Agent: curl/7.54.0
> Accept: */*
> Authorization: Token djfL0Zdu0N2tI3bNSJc_mpvZVCLdX6yU5gjP_zMfdsp6WdeNeF_Dcums3Y1H_f_GtHb-fYVy8m2MGZsaMeIa0Q==
>
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Date: Wed, 16 Jan 2019 22:02:43 GMT
< Content-Length: 412
<
{"links":{"logs":"/api/v2/tasks/03422b29cdf97000/runs/03422b60deb97000/logs","retry":"/api/v2/tasks/03422b29cdf97000/runs/03422b60deb97000/retry","self":"/api/v2/tasks/03422b29cdf97000/runs/03422b60deb97000","task":"/api/v2/tasks/03422b29cdf97000"},"id":"03422b60deb97000","taskID":"03422b29cdf97000","status":"scheduled","scheduledFor":"2019-01-16T22:02:00Z","requestedAt":"2019-01-16T14:02:43-08:00","log":""}
* Connection #0 to host localhost left intact

# Next retry fails, but it's a normal runtime error, not a panic
$ curl -v -XPOST  -H "Authorization: Token $(cat ~/.influxdbv2/credentials)" localhost:9999/api/v2/tasks/03422b29cdf97000/runs/03422b3604b97000/retry
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 9999 (#0)
> POST /api/v2/tasks/03422b29cdf97000/runs/03422b3604b97000/retry HTTP/1.1
> Host: localhost:9999
> User-Agent: curl/7.54.0
> Accept: */*
> Authorization: Token djfL0Zdu0N2tI3bNSJc_mpvZVCLdX6yU5gjP_zMfdsp6WdeNeF_Dcums3Y1H_f_GtHb-fYVy8m2MGZsaMeIa0Q==
>
< HTTP/1.1 500 Internal Server Error
< X-Influx-Error: previous retry for start=2019-01-16T22:02:00Z end=2019-01-16T22:02:00Z has not yet finished
< X-Influx-Reference: 1
< Date: Wed, 16 Jan 2019 22:02:45 GMT
< Content-Length: 0
<
* Connection #0 to host localhost left intact

I get the same behavior on master at 041161a646e91c24e9902783ad69ff4b111665ac.

I'm not sure what might be different between our environments such that you're getting a panic that I can't repro.

Do you see the rest of that stack trace in the server logs? The X-Influx-Error header is truncated at I think 256 characters, but I would expect the rest of the stack to be emitted from influxd.

bednar commented 5 years ago

Hi @mark-rushakoff Currently I am not able to simulate it too.