scientist-softserv / hykuup_knapsack

container project for the Hyku Up deploy of Hyku
Apache License 2.0
1 stars 0 forks source link

[SPIKE] hykuup knapsack tenant covenant.hykuup.com - long load time #228

Closed aprilrieger closed 2 weeks ago

aprilrieger commented 3 months ago

hykuup knapsack tenant covenant.hykuup.com, has communicated that the load times for their site are long. Please investigate.

aprilrieger commented 3 months ago

After reviewing the ingress-nginx-controller logs on the besties cluster I was able to see the errors that the tenant covenant.hykuup.com may have been experiencing.

In the logs there are errors relating to crowsec:

2024/06/07 22:18:05 [error] 1949#1949: *3939872 connect() failed (111: Connection refused), client: 10.0.6.229, server: ~^(?<subdomain>[\w-]+)\.hykuup\.com$, request: "GET /assets/application-ac706023ff85121bba95713d72e8b2c64f75d1436a760b1263c6c0a87871aa7a.css HTTP/2.0", host: "covenant.hykuup.com", referrer: "https://covenant.hykuup.com/"
2024/06/07 22:18:05 [error] 1949#1949: *3939872 [lua] crowdsec.lua:600: Allow(): [Crowdsec] bouncer error: request failed: connection refused, client: 10.0.6.229, server: ~^(?<subdomain>[\w-]+)\.hykuup\.com$, request: "GET /assets/application-ac706023ff85121bba95713d72e8b2c64f75d1436a760b1263c6c0a87871aa7a.css HTTP/2.0", host: "covenant.hykuup.com", referrer: "https://covenant.hykuup.com/"
2024/06/07 22:18:05 [error] 1949#1949: *3939872 connect() failed (111: Connection refused), client: 10.0.6.229, server: ~^(?<subdomain>[\w-]+)\.hykuup\.com$, request: "GET /assets/application-c6e417f508888410f54c4560593ec9b171b82156711b0bcce66db15ea43a1ced.js HTTP/2.0", host: "covenant.hykuup.com", referrer: "https://covenant.hykuup.com/"
2024/06/07 22:18:05 [error] 1949#1949: *3939872 [lua] crowdsec.lua:600: Allow(): [Crowdsec] bouncer error: request failed: connection refused, client: 10.0.6.229, server: ~^(?<subdomain>[\w-]+)\.hykuup\.com$, request: "GET /assets/application-c6e417f508888410f54c4560593ec9b171b82156711b0bcce66db15ea43a1ced.js HTTP/2.0", host: "covenant.hykuup.com", referrer: "https://covenant.hykuup.com/"
2024/06/07 22:18:05 [error] 1949#1949: *3939872 connect() failed (111: Connection refused), client: 10.0.6.229, server: ~^(?<subdomain>[\w-]+)\.hykuup\.com$, request: "GET /system/logo_images/1/original/Covenant_Theological_Seminary_-_white_on_grey.png HTTP/2.0", host: "covenant.hykuup.com", referrer: "https://covenant.hykuup.com/"
2024/06/07 22:18:05 [error] 1949#1949: *3939872 [lua] crowdsec.lua:600: Allow(): [Crowdsec] bouncer error: request failed: connection refused, client: 10.0.6.229, server: ~^(?<subdomain>[\w-]+)\.hykuup\.com$, request: "GET /system/logo_images/1/original/Covenant_Theological_Seminary_-_white_on_grey.png HTTP/2.0", host: "covenant.hykuup.com", referrer: "https://covenant.hykuup.com/"
2024/06/07 22:18:05 [error] 1949#1949: *3939872 connect() failed (111: Connection refused), client: 10.0.6.229, server: ~^(?<subdomain>[\w-]+)\.hykuup\.com$, request: "GET /downloads/9a097ae0-564c-41c1-a1f5-6d6e4652f8e7?file=thumbnail HTTP/2.0", host: "covenant.hykuup.com", referrer: "https://covenant.hykuup.com/"
2024/06/07 22:18:05 [error] 1949#1949: *3939872 [lua] crowdsec.lua:600: Allow(): [Crowdsec] bouncer error: request failed: connection refused, client: 10.0.6.229, server: ~^(?<subdomain>[\w-]+)\.hykuup\.com$, request: "GET /downloads/9a097ae0-564c-41c1-a1f5-6d6e4652f8e7?file=thumbnail HTTP/2.0", host: "covenant.hykuup.com", referrer: "https://covenant.hykuup.com/"

When I go to the crowdsec namespace and look into the crowdsec pod I see that the pod has been restarting almost hourly with an error of: Last state: Terminated with 137: OOMKilled, started: Fri, Jun 7 2024 2:12:04 pm, finished: Fri, Jun 7 2024 3:18:04 pm && Last state: Terminated with 137: OOMKilled, started: Fri, Jun 7 2024 3:18:05 pm, finished: Fri, Jun 7 2024 4:06:30 pm

crowdsec-lapi-6d788bcb57-z5mw6_crowdsec-lapi.log ingress-nginx-controller-6c9ff5f569-fx6xz_controller (3).log ingress-nginx-controller-6c9ff5f569-ks4kh_controller (2).log

aprilrieger commented 3 months ago

the repo.samvera tenant: https://assaydepot.slack.com/archives/C03CA8XRP3L/p1717805368959019 https://assaydepot.slack.com/archives/C03CA8XRP3L/p1717811140396729 had a slow to load issue at the same time I saw the oom killer on the crowdsec pod and the pod restarted.

aprilrieger commented 3 months ago

I added a website monitor so we can track this specific tenant over the weekend: https://www.site24x7.com/app/client#/home/monitors/195989000072363003/Summary

I also see that the resources/request is 100MiB and resource/limit is set to 100MiB -- I upped it to 200MiB for the weekend to see if that is helpful at reducing the amount of OOM Killed.

aprilrieger commented 3 months ago

the cordsec issue has been resolved but still seeing issues accross multiple hykuup tenants.

aprilrieger commented 3 months ago

Looked at the cluster and sclaed up another ingress nginx so each node had one. But still seeing the sites on hykuup flap.

aprilrieger commented 3 months ago

slack s3-engineering call for additional help: https://assaydepot.slack.com/archives/C0313NK5NMA/p1718232353329779 Seeing several bots hitting the svc hykuup-knapsack-production-hykuup-knapsack-production-hyrax-80 where they are getting https status code 200, listed below agents I have observed in the ingress-nginx logs on besties (added logs for review)

Agents seen:

(+http://www.facebook.com/externalhit_uatext.php)
(KHTML, like Gecko; compatible; ClaudeBot/1.0; +[claudebot@anthropic.com](mailto:claudebot@anthropic.com))
(KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
(+https://www.semanticscholar.org/crawler)

log entries: (+https://www.semanticscholar.org/crawler)

10.0.4.82 - - [12/Jun/2024:22:12:23 +0000] "GET /catalog?f%5Bsubject_sim%5D%5B%5D=Samvera+Community&locale=en HTTP/1.1" 200 11127 "-" "Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler)" 327 23.377 [hykuup-knapsack-production-hykuup-knapsack-production-hyrax-80] [] 10.0.6.215:3000 11116 23.315 200 b99d05443a48aebfa84e322c8d9127a7

ClaudeBot/1.0; +claudebot@anthropic.com)

10.0.5.234 - - [12/Jun/2024:22:11:13 +0000] "GET /catalog?f%5Bcreator_sim%5D%5B%5D=Murdock%2C+Michael&f%5Bkeyword_sim%5D%5B%5D=T-shirt+design&f%5Bresource_type_sim%5D%5B%5D=Image&locale=es&view=gallery HTTP/2.0" 200 8856 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" 706 9.670 [hykuup-knapsack-production-hykuup-knapsack-production-hyrax-80] [] 10.0.6.215:3000 8879 9.645 200 c26813659658bd47f8ebcaafdad70427

(+http://www.facebook.com/externalhit_uatext.php) (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com) (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) (+https://www.semanticscholar.org/crawler)

10.0.4.82 - - [12/Jun/2024:22:09:57 +0000] "GET /catalog?f%5Bcontributor_sim%5D%5B%5D=University+of+Oregon&f%5Bcreator_sim%5D%5B%5D=Mellinger%2C+Margaret&f%5Bcreator_sim%5D%5B%5D=Sato%2C+Linda&f%5Bcreator_sim%5D%5B%5D=Barth%2C+Duncan&locale=en&per_page=20 HTTP/2.0" 200 39074 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 240 6.192 [hykuup-knapsack-production-hykuup-knapsack-production-hyrax-80] [] 10.0.6.215:3000 39087 6.169 200 2e0fddc50c9cda7732d2b2cd0d3da7c1

(KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

10.0.5.234 - - [12/Jun/2024:21:51:27 +0000] "GET /catalog/facet/keyword_sim?f%5Bcontributor_sim%5D%5B%5D=Northwestern+University&locale=es&per_page=20&view=list HTTP/2.0" 200 4172 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 334 15.018 [hykuup-knapsack-production-hykuup-knapsack-production-hyrax-80] [] 10.0.5.54:3000 4195 14.990 200 1fd6f050ec8561d2da802564aa01dead

Added logs for review: hykuup-knapsack-production-hyrax-bfd9cd68f-qljqc_hyrax.log hykuup-knapsack-production-hyrax-bfd9cd68f-zlm8l_hyrax.log ingress-nginx-controller-844cb8786f-pgm5l_controller (2).log ingress-nginx-controller-844cb8786f-dgsnp_controller (2).log ingress-nginx-controller-844cb8786f-9njfj_controller (2).log ingress-nginx-controller-844cb8786f-4lb7h_controller (2).log