wp-media / wp-rocket

Performance optimization plugin for WordPress
https://wp-rocket.me
GNU General Public License v2.0
700 stars 219 forks source link

SaaS / Server Infrastructure Issues Cause Critical Errors to Be Logged #6395

Open joejoe04 opened 10 months ago

joejoe04 commented 10 months ago

Describe the bug group.one was having infrastructure issues with services like Imagify, RocketCDN and RUCSS which led to slowdowns and random server errors.

It seems in the case of Remove Unused CSS, it caused these errors to be logged for some:

2024-01-17T11:20:28+00:00 CRITICAL Uncaught TypeError: Argument 2 passed to WP_Rocket\Engine\Optimization\RUCSS\Database\Queries\UsedCSS::update_message() must be of the type int, string given, called in /home/mellins/public_html/wp-content/plugins/wp-rocket/inc/Engine/Optimization/RUCSS/Strategy/Strategies/DefaultProcess.php on line 108 and defined in /home/mellins/public_html/wp-content/plugins/wp-rocket/inc/Engine/Optimization/RUCSS/Database/Queries/UsedCSS.php:592
Stack trace:
#0 /home/mellins/public_html/wp-content/plugins/wp-rocket/inc/Engine/Optimization/RUCSS/Strategy/Strategies/DefaultProcess.php(108): WP_Rocket\Engine\Optimization\RUCSS\Database\Queries\UsedCSS->update_message(2333, 'http_request_fa...', 'cURL error 28: ...', ' - 2024-01-17 1...')
#1 /home/mellins/public_html/wp-content/plugins/wp-rocket/inc/Engine/Optimization/RUCSS/Strategy/Context/RetryContext.php(35): WP_Rocket\Engine\Optimization\RUCSS\Strategy\Strategies\DefaultProcess->execute(Object(stdClass), Array)
#2 /home/mellins/public_html/wp-content/plugins/wp-rocket/i i /home/mellins/public_html/wp-content/plugins/wp-rocket/inc/Engine/Optimization/RUCSS/Database/Queries/UsedCSS.php på rad 592

Expected behavior We should protect against these errors in cases where similar server/infrastructure issues might happen again in the future.

Additional context https://wp-media.slack.com/archives/CAFHCKKNV/p1705502738733959 https://wp-media.slack.com/archives/C43T1AYMQ/p1705526400628139 https://secure.helpscout.net/conversation/2483200195/469576/ https://secure.helpscout.net/conversation/2483200980/469577/

Acceptance Criteria (for WP Media team use only)

  1. R&D why the problem happened
  2. Guard our code against this problem if happens in the future
Miraeld commented 8 months ago

Reproduce the problem

Well, there isn't a proper way to reproduce the issue, however, we could hard code a string in $job_details['code'] here https://github.com/wp-media/wp-rocket/blob/2b2243e08aecf0d7618179106c2a133749762e47/inc/Engine/Optimization/RUCSS/Strategy/Strategies/DefaultProcess.php#L108 This would ensure to get the same error as reported.

Identify the root cause

The issue is that a string is given in the update_message method while it's waiting for an int.

Scope a solution

To solve the issue we could force the cast of$job_details['code'] to an int . So in case this is a string with an actual number it would output its int version otherwise it would return 0. To do that, we can modify this line by : $this->used_css_query->update_message( $row_details->id, (int) $job_details['code'], $job_details['message'], $row_details->error_message );

Effort estimation:

S

Is a refactor needed in that part of the codebase?

No

remyperona commented 8 months ago

I think we shouldn't bail-out, we still want the process to fully execute. But we could instead make sure that if we receive a string, we either convert it to the corresponding int value (if it's a string representing a number), or add a 0 value.

Miraeld commented 8 months ago

I've modified the grooming accordingly to the feedback from @Tabrisrp

remyperona commented 8 months ago

LGTM

foliovision commented 8 months ago

If it helps we are running into this exact issue right now.

We can dig out any details that might help with your fix.

Thanks, Martin

MathieuLamiot commented 7 months ago

https://wp-media.slack.com/archives/C43T1AYMQ/p1712848710899159 Example of what is passed as argument in the stack trace. This is indeed linked to cUrl errors.

MathieuLamiot commented 6 months ago

Added to cooldown sprint as it is blocking this: https://github.com/wp-media/wp-rocket-lighthouse-simulator/issues/36

CrochetFeve0251 commented 5 months ago

Reproduce the problem

Well, there isn't a proper way to reproduce the issue, however, we could hard code a string in $job_details['code'] here

https://github.com/wp-media/wp-rocket/blob/2b2243e08aecf0d7618179106c2a133749762e47/inc/Engine/Optimization/RUCSS/Strategy/Strategies/DefaultProcess.php#L108

This would ensure to get the same error as reported.

Identify the root cause

The issue is that a string is given in the update_message method while it's waiting for an int.

Scope a solution

To solve the issue we could force the cast of$job_details['code'] to an int . So in case this is a string with an actual number it would output its int version otherwise it would return 0. To do that, we can modify this line by : $this->used_css_query->update_message( $row_details->id, (int) $job_details['code'], $job_details['message'], $row_details->error_message );

Effort estimation:

S

Is a refactor needed in that part of the codebase?

No

@Miraeld To reproduce the issue you can try to set a really low timeout from what I remember it worked as it was returning an empty string instead of an int for the status code.

You can also check the respond structure to have an example to add inside your tests.

Miraeld commented 5 months ago

@CrochetFeve0251 , yes but in local we don't send anything to the SaaS, so if someone is trying to reproduce in a local environment, I think the solution I gave is kinda the easiest.

However on a real environment, (staging or production), in fact, the low timeout should get the same output, a string which will cause an issue.

alfonso100 commented 4 months ago

another case here: https://secure.helpscout.net/conversation/2660678910/504266?folderId=273761

juricazuanovic commented 3 months ago

And another case here: https://secure.helpscout.net/conversation/2660678910/504266/