nasa / cumulus

Cumulus Framework + Cumulus API
Other
262 stars 107 forks source link

CNM tasks getting delayed #3772

Closed hailiangzhang closed 2 months ago

hailiangzhang commented 3 months ago

At GES DISC, our CNM tasks are frequently delayed. The sqsMessageConsumer is supposed to run every minute, but CNM messages may not be dispatched for over 10 minutes.

hailiangzhang commented 3 months ago

I believe the issue arises because sqsMessageConsumer needs to retrieve all enabled rules to dispatch CNM messages. At GES DISC, we have a large number of rules (over 8,000 in UAT), which leads to two problems:

  1. sqsMessageConsumer takes too long to fetch all enabled rules (~30 seconds out of the 1-minute cron interval).
  2. Database connections frequently fail due to the high volume of queries.

I've submitted a PR here to address this issue. Basically this fix not only address the issue of CNM message delay, it also reduced the rule query time from ~ 30 seconds to 2-3 seconds. The fix is actual quite simple -- since this affects both our UAT and PROD environments, it would be greatly appreciated if the Cumulus core team could review my PR at your earliest convenience.

paulpilone commented 2 months ago

@hailiangzhang v18.3.4 has been released and includes this fix: https://github.com/nasa/cumulus/releases/tag/v18.3.4. Going to close this Issue.