Closed thoellrich closed 7 years ago
Hi @thoellrich,
Can you please provide a failing snippet so I can investigate more effectively?
PS: I suspect you are running some goroutines without waiting for them to finish before exiting you lambda. This is just an intuition. So please double check that when your lambda function returns, no goroutine is still running.
Thanks for the quick response. The error happens before my code is called. One of the first things I do in my handler is logging a message that indicates what has arrived. This line is never printed in the failing cases.
The 3 go-routines above are the only ones active. In the failing case I did not even get a chance to spawn a new go-routine. At the end of my handler, I also log any still active go-routines and I can confirm that whatever I launch, is terminated correctly.
If I read the stack trace above correctly, the crash happens in the reflect.Value() call on line 140 of runtime.go: https://github.com/eawsy/aws-lambda-go-shim/blob/master/src/runtime.go#L140
I see that eawsy/aws-lambda-go-shim was just updated to use 1.8.2. I'll see if that makes a difference.
Please isolate the failing case and provide a code snippet. I cannot see what is happening this way. My intuition seems false. And if there is a bug with Go 1.8.2 we also need to provide the Go team a failing test. PS: the stack trace provides info about the runtime.go#L140 but the problem is that reflect stumble upon a nil value and we have to find the root of the problem.
I reduced the handler to the bare minimum and it's still showing the crash.
Here's my handler.go:
package main
import (
"log"
"github.com/eawsy/aws-lambda-go-core/service/lambda/runtime"
"github.com/eawsy/aws-lambda-go-event/service/lambda/runtime/event/kinesisstreamsevt"
)
func Handle(evt *kinesisstreamsevt.Event, c *runtime.Context) (result interface{}, err error) {
log.Printf("Handle() - record count=%d\n", len(evt.Records))
return nil, nil
}
This lambda is listening on a Kinesis stream. The batch size on the trigger is set to 100.
An external process feeds sensor data into the stream.
The crash seems to happen more consistently when the first batch into the lambda is full (=batch size).
When the crash happens I can see the same stack trace as above. The log.Printf()
is not called.
Here's the crash followed by a "good" call (you can see the result of the log.Printf) in Cloudwatch Logs:
Something seems to wrong when the first batch of records arrive and that batch happens to be close or equal to batch size. The unraveling of the handler's event seems to cause the exception above. ctx.Elem()
was used before, but evt.Elem()
wasn't.
Hope this makes sense.
Thank you very much for taking the time to provide a minimal test case. I need your assistance to spot the problem.
Can you change a little bit your function to know if the incoming event is always filled with something:
package main
import (
"log"
"github.com/eawsy/aws-lambda-go-core/service/lambda/runtime"
"github.com/eawsy/aws-lambda-go-event/service/lambda/runtime/event/kinesisstreamsevt"
)
func Handle(evt *kinesisstreamsevt.Event, c *runtime.Context) (result interface{}, err error) {
log.Printf("evt is not nil: %t", evt != nil)
log.Printf("Handle() - record count=%d\n", len(evt.Records))
return nil, nil
}
I tried various different runs but cannot reproduce the error, so it is very important to keep your settings without any modification, so we can dig into the bug and find it.
PS: Can you please run the following command to know if you run the latest version of the shim docker run --rm -it eawsy/aws-lambda-go-shim:latest version
.
I absolutely cannot reproduce the error. Moreover, we have several Lambda behind kinesis and they are executed several hundred millions of times per month and I haven't spotted this error in the CloudWatch logs. Can you provide me more info info about your events, your Lambda configs, etc? I'm stuck! If you need to converse in private mode, let me know.
I changed the code to:
package main
import (
"log"
"github.com/eawsy/aws-lambda-go-core/service/lambda/runtime"
"github.com/eawsy/aws-lambda-go-event/service/lambda/runtime/event/kinesisstreamsevt"
)
func Handle(evt *kinesisstreamsevt.Event, c *runtime.Context) (result interface{}, err error) {
if evt != nil {
log.Printf("Handle() - record count=%d\n", len(evt.Records))
} else {
log.Println("Handle() - evt==nil")
}
return nil, nil
}
and don't see any occurences of the evt==nil
case. Yet I still see the SIGSEGV.
Also it looks like I had the latest version of the shim in place:
$ docker pull eawsy/aws-lambda-go-shim:latest
latest: Pulling from eawsy/aws-lambda-go-shim
Digest: sha256:e7d0182f0703226449446dfbf726dbab4ad35e1960b9a0e1870d80edef34f7ca
Status: Image is up to date for eawsy/aws-lambda-go-shim:latest
$ docker run --rm -it eawsy/aws-lambda-go-shim:latest version
2017-05-24
The code also works fine here and also handles millions of messages. The problem only occurs under 2 conditions: a) I push a code update and Lambda calls it the first time and b) Lambda decides to recycle the function (happens on a regular basis) and calls the new instance for the first time. For those 2 conditions I see the SIGSEGV. As Lambda detects the crash, it'll resend the Kinesis message and no data is lost.
I still think that evt
at https://github.com/eawsy/aws-lambda-go-shim/blob/master/src/runtime.go#L140 has an unexpected value. Maybe I should try to build the shim here locally, add some logging and see if I can get more details that way? What do you think?
Interesting! I changed the runtime.go
code to read:
After that change I created the updated shim docker image, rebuilt my handler and called it several times. Each time I made sure that Lambda loaded it anew.
I saw it crash 2 times. Look at the Cloudwatch Logs Search below, I bet you can immediately spot where it crashed:
I have no idea why you're getting that weird address for the handler symbol. And why it works for the same file the 2nd time.
At that point it may be worth noting that my handler.so is ~ 20MB and the zip ~ 6MB:
-rw-r--r-- 1 ubuntu ubuntu 21230440 May 26 11:43 handler.so
-rw-r--r-- 1 ubuntu ubuntu 6401036 May 26 11:43 handler.zip
Could size be an issue?
So thankful for your efforts, it seems to be a very awkward bug.
I just modified the main handler. There's other code that pulls in the AWS-SDK, sql.DB and other stuff. It's still part of the simplified *.so
, it just does not get called (other than any init()
). That would account for the size differences. Sorry I did not mention that.
I'm wondering now if the plugin is not completely loaded. Could it still be relocating?
I don't think that there is any problem with the size and the loading process. We've already uploaded very heavy .so and did not encountered any issue. What I was suspecting is that there is some process in an init function somewhere in your dependencies or your own code which cause the problem. But as you've provided me a simple snippet I was very upset to not find any init function :smile: If the error happens only when you freshly reset/upload your Lambda, then we have to focus on init functions. Can you take an inventory of your init functions and then we will dig in each category and try to spot the bug.
One more thing, you must have less than 8 exported functions in your Handler. Is this verified?
I've opened an issue long time ago https://github.com/golang/go/issues/19269 that will be resolved in Go1.9 and the issue you have referenced also references mine. I think we are getting closer :smile:
Yeah - I think this is the same problem we are talking about. I think I installed the shim before the note about the 8 exported functions was there :-( That could be the problem here. I'll check exports and init()s and report back any findings.
My handler package had 11 exported symbols. Reduced it down to the bare minimum and have not seen the crash since. Thanks for your help and I'm sorry we were chasing a known problem. Hopefully it'll help somebody else who runs into the same issue. Thanks!
Perfect. No problem, it's always with a pleasure that we tend to help around this project. It was also my fault to not think about this limitation at the beginning. It will certainly help others and in a few month with the upcoming Go1.9 this kind of issue will disappear. Anyway, happy Go on Lambda and do not hesitate if you encounter any other problem in the future.
About 20% of first time invocations of my lambda handler crash before my code is called. This happens somewhere in handle(). I don't see this exception when everything is running normally. It only appears to happen for the first call right after I upload a new version of my lambda handler. Any ideas?