lazywithclass / winston-cloudwatch

Send logs to Amazon Cloudwatch using Winston.
MIT License
258 stars 104 forks source link

InvalidSequenceTokenException: The given sequenceToken is invalid. The next expected sequenceToken is: #137

Open nish20490 opened 3 years ago

nish20490 commented 3 years ago

In our project we have 6 microservices running and we using Seneca for them. We have a centralized logger implemented using winston. Recentry we tried to add logs to cloudwatch so we have added Winston cloudwatch. But wheneverI start my development server, each microservice starts which puts a log to cloudwatch. So during this, each time I get this InvalidSequenceTokenException for atleast 2 of the microservices start logs. I am not sure why this is happening. Any help would be appreciated.

nish20490 commented 3 years ago

Because of this exception, some logs are not reaching to cloudwatch. We are facing same issue on staging as well as on prod. Is there a way to retry these missing logs?

lazywithclass commented 3 years ago

Are you using 6 different streams? If I remember correctly (it's been years since I've used this library in production), you should assign to each different source of logs.

There should be a best practice somewhere in AWS Cloudwatch documentation explaining this.

Lisenish commented 3 years ago

@nish20490 @lazywithclass

Same happens to me, we have just two services (worker and API) on in AWS, each of them has 2 instances. Initially services used different streams but instances shared it and everything worked fine but after some time I noticed that services stop log anything after ~6-24 since the last deploy. Restart or deploy fixes it for the next ~6-24 hours.

I saw DataAlreadyAcceptedException and InvalidSequenceTokenException errors in logs and decided that the reason is the shared streams, I reconfigured them so now each instance has it's own steram - but unfortunately it didn't help, I continue to see the exactly same behaviour and the same errors in logs

I also think this closed ticket may be related https://github.com/lazywithclass/winston-cloudwatch/issues/59. And the similar issue in another library for logging https://github.com/mirkokiefer/bunyan-cloudwatch/issues/20 (they also provided info why DataAlreadyAcceptedException may happen instead of InvalidSequenceTokenException

UPD: I've created this PR which supposedly should fix the issue with DataAlreadyAcceptedException at least https://github.com/lazywithclass/winston-cloudwatch/pull/158. I also noticed that in InvalidSequenceTokenException error handling we don't try to use expectedSequenceToken from the error payload. Maybe we should add this in a separate PR, it will allow us to get the correct token without an extra API call.

lazywithclass commented 3 years ago

Did the proposed PR solved your issue? If so I will review and test it in my environment later this week.

Lisenish commented 3 years ago

@lazywithclass I tested it yesterday and unfortunately it's not :(. Maybe I did something wrong in the PR but after I deployed this fix I stopped seeing DataAlreadyAcceptedException or InvalidSequenceTokenException errors in the logs but they still suddenly stopped appearing after some time.

lazywithclass commented 3 years ago

Ok I will have to give this a look, but I am studying for September's exam session, so I cannot guarantee a prompt fix.

kubakrzempek commented 2 years ago

Now the lib upon InvalidSequenceTokenException will try to get a new sequence token and retry sending the logs. However, if there are many resources racing for the same stream (e.g., 4 hosts running the same app from behind load balancer), it might result in the InvalidSequenceTokenException reappearing, and in this case, the error will be thrown and the logs (from that batch) discarded.

How about introducing an option retryCount (or something) where we could decide how many times the lib should try to get the correct token (for example I'm okay if it will try let's say 30 times). Would that work? If so, I can submit a PR.

symbianm commented 2 years ago

Any updates? Have the same issue.

kallekj commented 2 years ago

I'm also experiencing this issue, it occurs when we have multiple tests running in parallel which overloads the stream, or it tries to reuse the same sequence token.

sporteman commented 1 year ago

Any news on that? I'm getting the same issue here.

sowmenappd commented 1 year ago

Hey guys, has anyone worked on this issue yet? We're getting a lot of InvalidSequenceTokenException because even though our services use separate streams, most of our services have several instances, which eventually hits the error because of the same race condition mentioned by @kubakrzempek.

Lisenish commented 1 year ago

@sowmenappd In our case, we just migrated to pino with the logs emitted just as stdout and streamed to CW by AWS. I think it's a more reliable and stable approach, but in case you want to stream them by yourself there's pino-cloudwatch transport.

sowmenappd commented 1 year ago

@Lisenish thank you, yes this definitely seems like a far better approach. However, wouldn't the pino-cloudwatch method also suffer from the same issue if there are multiple instances of the same service?

kubakrzempek commented 1 year ago

I completely agree with you! Streaming the logs is definitely the way to go, especially when it comes to scaling. It just makes everything so much easier to manage and maintain.

Lisenish commented 1 year ago

@sowmenappd I didn't use pino-cloudwatch actually so can't tell for sure, I just used default stdout and streamed stdout to CW by AWS without using any libraries. I didn't encounter any issues with this approach (although IIRC it creates a separate stream for each machine, which is actually a correct way too)