Open gondzo opened 5 years ago
@sachin-maheshwari I was debugging the issue with missing email for @gondzo for the said project and found that tc-notifications
successfully raised the event notifications.action.email.connect.project.notifications.generic
at 2019-05-06T12:42:23
UTC time. However, I am not able to find a corresponding log in tc-message-service
logs. In fact I am not able t find any log (without any filter) after 2019-05-06T12:00:00
till the 2019-05-06T23:59:00
for tc-email-service
. Am I missing something here?
This is still an issue on this same project, but I've noticed that I (and everyone else on the project) do get notified of posts of topcoder users, but don't get any notifications for posts by SSO users (client). Could that be a clue?
Thanks for more insights @gondzo. Just double checking, you are not receiving notifications for posts made by Customers (most probably who are logging in as SSO user), right?
yes, correct
@gondzo could you please also let me know one specific post for which you didn't receive the email notification?
last post in that project - "I have read and approve the challenge spec. Thank you. "
On Wed, May 15, 2019, 09:14 vikasrohit notifications@github.com wrote:
@gondzo https://github.com/gondzo could you please also let me know one specific post for which you didn't receive the email notification?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/topcoder-platform/tc-notifications/issues/115?email_source=notifications&email_token=AAXJA7RLMPQP2MXOZQYPLYLPVOZ6TA5CNFSM4HL4BJBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVNXRII#issuecomment-492533921, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXJA7XYR6NXSQ375WZXCB3PVOZ6TANCNFSM4HL4BJBA .
THanks @gondzo It seems the the kafka consumer got killed (or at least stopped listening) because of some error in parsing one of the bus event. And the AWS ECS didn't consider it as dead task because the email scheduler is still running on the same machine. For now I have restarted the task, however, we need to find a way to get the task restarted automatically by ECS when such thing happens again in future. Could you help us in doing that @gondzo ? Also, just fyi, I am trying to patch the code right now for the specific error which caused this situation in first place, but we can not guarantee that there is no other place from where such unhandled errors can be thrown.
Thanks @sachin-maheshwari for debugging the issue with us.
can we configure a health check for ecs that would check the consumer and scheduler? maybe using https://github.com/topcoder-platform/topcoder-healthcheck-dropin ?
@gondzo sorry for the delayed response, however, we already have the healthcheck dropin in place. It seems like the custom health check function is not able to determine that kafka consumer is in stale state and hence it is always returning true
for health check. Do you have idea about determining the correct status of the kafka consumer with no-kafka
library?
I've noticed that I didn't get notifications for one specific project - 9883 @vikasrohit @sachin-maheshwari any way that we can investigate this?
I did get the web notification (ex notification ID 255503), but no email