google / nginx-sxg-module

NGINX SXG module
Apache License 2.0
78 stars 18 forks source link

The homepage of the website in search results page opens wrong url #112

Open naveedahmed1 opened 1 year ago

naveedahmed1 commented 1 year ago

Hi,

I have SXGs enabled for my domain Mustakbil.com.

But when I search google.com for Mustakbil, and click the very first link which points to the home page, it opens a random wrong page of my website.

Here's the markup of the link from google search results:

<a href="https://www.mustakbil.com/" data-sxg-url="https://www-mustakbil-com.webpkgcache.com/doc/-/s/www.mustakbil.com/"></div></span><div><span class="VuuXrf">Mustakbil.com</span><div class="byrV5b"><cite class="tjvcx GvPZzd cHaqb" role="text" style="max-width:315px">https://www.mustakbil.com</cite></div></div></div></a>

The href is pointing correctly to https://www.mustakbil.com/

and SXGs url is also pointing correctly to https://www-mustakbil-com.webpkgcache.com/doc/-/s/www.mustakbil.com/

But when I click this link it opens wrong url. Which is a random page of my website but not the actual homepage.

banaag commented 1 year ago

Hi thank you for your report. Looking at the console messages for your URL, there appears to be some issues with your setup. Kindly re-confirm that your Signed Exchange is setup correctly.

Error message was: Content type of cert-url must be application/cert-chain+cbor. Actual content type: text/html Failed to fetch the certificate.

naveedahmed1 commented 1 year ago

Due to the issue, we disabled the SXG from Cloudflare control panel. I have enabled it again. Can you please take a look now?

Thank you!

banaag commented 1 year ago

It's still showing the same error:

Content type of cert-url must be application/cert-chain+cbor. Actual content type: text/html Failed to fetch the certificate.

Response code: 200 Header integrity hash: sha256-NimlFFOtuCRIWNtA0hreyW3AvL9psZgycA88M2wd/S0= Response headers: cache-control: s-maxage=2592000 cf-cache-status: BYPASS cf-ray: 7b7dd1e777f246e9-DFW content-encoding: mi-sha256-03 content-security-policy: frame-ancestors 'self'; content-type: text/html date: Fri, 14 Apr 2023 17:55:29 GMT digest: mi-sha256-03=6jh2kkDmqLbONfF7MPervM+Lq+JnMooOMaf/eEzWr9o= last-modified: Fri, 14 Apr 2023 14:07:06 GMT referrer-policy: strict-origin server: cloudflare vary: Accept-Encoding x-content-type-options: nosniff x-frame-options: SAMEORIGIN x-xss-protection: 1; mode=block

Label: sig Signature: 30 45 02 21 00 F3 17 AE 28 63 E5 B6 AF A7 28 53 98 5A F9 B9 90 06 9F 98 1C BA EE AA 45 64 80 55 9D 29 62 06 6F 02 20 15 56 56 86 EB 9F D8 DA F0 FF C6 1F 56 D4 72 88 93 61 9E 55 91 BC 8A 82 89 7E 55 DE 12 2E A3 03 Certificate URL: https://www-mustakbil-com.webpkgcache.com/crt/ddKVd5r48W3_/s/www.mustakbil.com/cdn-fpw/sxg/cert.pem.msg.ddKVd5r48W3_Oep93WuifhipOJDAloU8oX2bMOYFN-4 Integrity: digest/mi-sha256-03 Certificate SHA256: 75 D2 95 77 9A F8 F1 6D FF 39 EA 7D DD 6B A2 7E 18 A9 38 90 C0 96 85 3C A1 7D 9B 30 E6 05 37 EE Validity URL: https://www.mustakbil.com/cdn-fpw/sxg/valid.msg.validity Date: Fri, 14 Apr 2023 16:55:29 GMT Expires: Fri, 21 Apr 2023 16:55:29 GMT

naveedahmed1 commented 1 year ago

Thank you so much @banaag for the update.

The above response show cf-cache-status: BYPASS. Can you please try opening the url on your local machine and see if you notice a different header value for cf-cache-status: HIT?

Since the Cloudflare provides SXG for the site cf-cache-status: BYPASS could be the issue.

On my machine, I see cf-cache-status: HIT and with cf-cache-status: HIT the SXG should work, correct?

banaag commented 1 year ago

So that response was opened from my local machine.

I tried reopening just now and I get a MISS:

Request URL: https://www.mustakbil.com/ Request Method: GET Status Code: 200 (from service worker) Referrer Policy: strict-origin-when-cross-origin alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400 cache-control: public, max-age=31536000, s-maxage=31536000 cf-cache-status: MISS cf-ray: 7b92058b2e199464-SJC content-encoding: br content-security-policy: frame-ancestors 'self'; content-type: text/html date: Mon, 17 Apr 2023 04:45:58 GMT last-modified: Mon, 17 Apr 2023 14:13:26 GMT nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800} referrer-policy: strict-origin report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=Yua%2B8U32bUmk5LFTM5sspVd6etia13fVeILwcigiCz3h%2Fob4%2BT%2FJ3HwhCyscRP3jJHG539W7FfapZUmdj%2FI%2F0JX66s21CBW%2BtOC30Q9BLaIFExx0wPBkCc3TFfo3ftyUB2G1"}],"group":"cf-nel","max_age":604800} server: cloudflare strict-transport-security: max-age=2592000; includeSubDomains; preload vary: Accept-Encoding x-content-type-options: nosniff x-frame-options: SAMEORIGIN x-xss-protection: 1; mode=block Provisional headers are shown. Disable cache to see full headers. Learn more sec-ch-ua: "Chromium";v="112", "Google Chrome";v="112", "Not:A-Brand";v="99" sec-ch-ua-mobile: ?1 sec-ch-ua-platform: "Android" Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Linux; Android 11; Pixel 5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.91 Mobile Safari/537.36

From using this tool on your site, it seems there's something wrong with the cert-cbor: https://chromewebstore.google.com/detail/hiijcdgcphjeljafieaejfhodfbpmgoe?pli=1

naveedahmed1 commented 1 year ago

Thank you @banaag!

It seems Cloudflare takes 3 page loads to put it in to their cache.

The first one is cf-cache-status: BYPASS

Second one with: cf-cache-status: MISS

Third one with: cf-cache-status: HIT

Can you please try to refresh the page a few times and once you see `cf-cache-status: HIT', please check if you still receive the the same certificate error?

banaag commented 1 year ago

I did it at least 5 times with same result. I still get the cert error.

naveedahmed1 commented 1 year ago

Ok, thank you so much for the update.

I also have a ticket opened with Cloudflare since they automatically handle certificates and SXG config.

I will update you once I hear back from them.

naveedahmed1 commented 1 year ago

@banaag can you please try it now.

During that time when I searched google.com for "Mustakbil" the first result was pointing directly to https://www.mustakbil.com/ instead of the sxg url. Probably because of the issue you mentioned above.

Now, if I search for Mustakbil in google the first url is pointing to the sxg url:

<a href="https://www-mustakbil-com.webpkgcache.com/doc/-/s/www.mustakbil.com/" data-sxg-url="https://www-mustakbil-com.webpkgcache.com/doc/-/s/www.mustakbil.com/" data-ved="2ahUKEwiN5ee28bH-AhUEPOwKHdZ7AJgQFnoECAkQAQ" ping="/url?sa=t&amp;source=web&amp;rct=j&amp;url=https://www.mustakbil.com/&amp;ved=2ahUKEwiN5ee28bH-AhUEPOwKHdZ7AJgQFnoECAkQAQ"><br><h3 class="LC20lb MBeuO DKV0Md">Mustakbil.com</h3><div class="TbwUpd NJjxre iUh30 ojE3Fb"><span class="H9lube"><div class="eqA2re NjwKYd Vwoesf" aria-hidden="true"><img class="XNo5Ab" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAMAAABEpIrGAAAAqFBMVEX////99vf32tz11Nb109b55ujom6HbVmDVJTf77e7ieIHWLj7WNkTXOkjjgIfvvMDVKTrWMEDUHzPsq7DTCibaTFjgbnfZQ1DmkJbpoqfeZW/dXGb70tLnkpm/0OR7pM5Yib6Lqs/pfYLx7PCnxuFslsXM2urX4u5Jhb4XaK4ucLIjbLAzc7MSZq2lvdoobrHw9Pnk6/Q9ere7zeKeuNeVstPts7cAX6r2hvrfAAABgUlEQVR4AWKgMgDQMQ4IAIQAEDxmu/7/z7MxuUXdtF1T/6k9gGgBgv5DxgRRxhcYRQI/dak0v6CVvOvA8AcGXHX71CeQPfVm0xmblqZ6eWjXHQY/CxTpQDSKyabJQSnJR4VEk15yV4+tkFVixDAQxVRmZgosmRN78f43qx0q/1VhgzLzeHl5bQaurz754eU0Ct+eYO8iy4sSjkbjybSYDQlcNzU9gpBKawNMtNFmyhBBs2APtLZWF5A5a62pQLQZnKQG7qEycVxlUOj4UntCSSIWHXPZhblNgFBpvoQ8HxacHsBCdeJpEiiPV/nwi8MLkM04hLRQT2HpVn2Rpw8wa8RxqKzjixMwV4u+zcv1J7GxbS8L43wf1CkElwR514ubgayLPojbp14cwEa0BOHUgo5nYKOtSUWJTQzTCcid5DNiahcCYFYUmQf8SjCwdDallC47755qwicqp+0XtFvyhayuP8/XJuMbvlSms2ijcs9PfFXUTilnimXgL8JsFvhn3gFFYTx5yWK7bgAAAABJRU5ErkJggg==" style="height:18px;width:18px" alt="" data-atf="1" data-frt="0"></div></span><div><span class="VuuXrf">Mustakbil.com</span><div class="byrV5b"><cite class="tjvcx GvPZzd cHaqb" role="text" style="max-width:315px">https://www.mustakbil.com</cite></div></div></div></a>

If you hover the link you will see that its pointing to https://www-mustakbil-com.webpkgcache.com/doc/-/s/www.mustakbil.com/ But if you click this link it will open https://www.mustakbil.com/jobs/pakistan/ahmadpur-east

banaag commented 1 year ago

Checking our internal systems, I see this error:

ingestion_error : VALIDATION_ERROR
ingestion_error_message : "SXG validation failure: Certificate is not valid; bad OCSP status: 2; details: 6; cert trust"

This means there was an error validating the certificate that was issued because the OCSP server that performs that validation is either down or giving an error.

naveedahmed1 commented 1 year ago

Ok, thank you for the prompt response and sharing this update. I have yet to hear back from Cloudflare team on this issue.

It seems that the issue is with their infrastructure which provides SXGs. certificates and verifications.

naveedahmed1 commented 1 year ago

I have one question if there's a certificate verification error, then how is it working for some other urls e.g.

If you open google.com and then enter https://www.mustakbil.com/jobs/pakistan/lahore and review first url, you will notice that its pointing to the correct sxg url (https://www-mustakbil-com.webpkgcache.com/doc/-/s/www.mustakbil.com/jobs/pakistan/lahore) and clicking that link opens the correct page i.e. https://www.mustakbil.com/jobs/pakistan/lahore.

banaag commented 1 year ago

It's possible that when the page with the error was being ingested into the system, the cert verification ran into the OCSP server issue, hence the cert could not be verified at that time.

naveedahmed1 commented 1 year ago

Thank you @banaag! Just to understand it better, page with the error are you referring to the certificate verification error? or error on the page?

If you are referring to the certificate verification error, the same certificate would be used for all requests to the same site, correct?

If so, how is it possible that all other pages of the site are being served correctly with SXG and only the homepage is having issue.

The search result for Mustakbil is now pointing to a new url, I mean on markup of the SERP its correct but when clicked, its now taking to a a new but wrong page and this time its pointing to https://www.mustakbil.com/companies/pakistan/restaurants.

The other pages of the site are being served properly with SXG, for example https://www.mustakbil.com/jobs/pakistan/lahore is still working just fine and was last cache on 18 Apr 2023 10:51:34 GMT.

Whereas, the homepage was cached on 18 Apr 2023 12:44:01 GMT.

I have also verified the canonical url on the cached page, its also correct and pointing to <link rel="canonical" href="https://www.mustakbil.com/">.

banaag commented 1 year ago

Yes, the same cert is being used for all the requests. However, when the page is ingested, the entity doing the verification (the OCSP server), marked the cert as in error (I don't know why, but the server could be down during that time). So while the other pages may have passed, the main page failed this verification for some reason.

naveedahmed1 commented 1 year ago

I have checked few other pages of the site and they all work just fine.

The only different I see is between the cache headers:

For home page we have cache-control: s-maxage=2592000

For other pages for which the SXG is working fine we have below cache headers: cache-control: max-age=3600, s-maxage=2592000

Can this cause any issue?

banaag commented 1 year ago

Yes, it's possible that may cause issues for any error pages ingested to last longer in the cache. Could you try fixing the problem and waiting for the page to get re-ingested? Thank you.

naveedahmed1 commented 1 year ago

Yes, just updated the cache headers of the page.

BTW cache-control: s-maxage=2592000 which we had for the homepage was for Edge Cache (Cloudflare).

I believe its the max-age header which was missing.

I found this post about https://developer.chrome.com/blog/optimizing-lcp-using-signed-exchanges/#max-age and it seems that minimum value should be 2 minutes.

naveedahmed1 commented 1 year ago

I just searched google.com for the term mustakbil and clicked the first link, and this time it took me to the right page..

Can you please verify it from your side?

naveedahmed1 commented 1 year ago

It seems that it didn't fix the issue.

banaag commented 1 year ago

I just looked at the home page and it looks like it reverted to being a normal HTML file, not an SXG.

naveedahmed1 commented 1 year ago

I ultimately disabled SXG for the homepage by adding cdn-cache-control=no-cache header.

Since it was creating a bad user experience.

naveedahmed1 commented 1 year ago

Even that doesn't help, as per the docs

For example, cdn-cache-control=no-cache would mean that a signed exchange is not created

https://developers.cloudflare.com/fundamentals/speed/signed-exchanges/signed-exchanges-caveats/

But its still being created. I'm in an awkward situation :(

banaag commented 1 year ago

I just chedked just now and I don't see any more errors in the home page. Does everything look ok from your end?

naveedahmed1 commented 1 year ago

I am still trying few things to find out what would actually work.

Can you please guide which cache-control: header would enable SXG?

We had max-age and s-maxage and SXG was working fine for the website including subdomains except for the homepage of www.mustakbil.com.

We removed s-maxage and added just max-age and SXG stopped working for the whole whole website.

Added cache-control: max-age=86400, s-maxage=21600, stale-while-revalidate=604800, stale-if-error=86400, SXG worked but same issue with homepage.

Just set max-age=21600, s-maxage=21600, stale-while-revalidate=604800, stale-if-error=86400

Same value for max-age and s-maxage and waiting to see how Google response to this.

banaag commented 1 year ago

Hi Naveed,

I just tried doing a search for your home page and the links there are correct and it redirects to your home page with no errors. Are you seeing something different from your end?


Mustakbil.com

naveedahmed1 commented 1 year ago

But when I try clicking the link it takes me to https://www.mustakbil.com/jobs/pakistan/abbottabad

banaag commented 1 year ago

It's possible that a subset of the cached copies got updated but the rest haven't yet. You might want to wait for the old cached copies to expire before making any further changes.

naveedahmed1 commented 1 year ago

Ok, I will wait.

Can you please confirm the required cache-control header value for SXG to work?

Apparently it seems that without s-maxage it doesnt work.

banaag commented 1 year ago

https://developers.google.com/search/docs/appearance/signed-exchange#additional-requirements-for-google-search

naveedahmed1 commented 1 year ago

Still no success :(

It seems that SXGs doesnt work without s-maxage header in cache control.

Adding it breaks homepage SXG.

I have tried https://developers.google.com/search/docs/appearance/signed-exchange#debug-the-google-sxg-cache as well and I don't see anything which is an indication that there's some issue with my site config.

naveedahmed1 commented 1 year ago

Just to add, I have also tried SXG Validator Chrome Plugin (https://github.com/google/sxg-validator) and it also validates the SXG.

But in search results I dont see SXG for any of the urls of my website.

banaag commented 1 year ago

Looks like your home page reverted back to normal HTML.

Checking what's in the Signed Exchange cache, it still contains a valid SXG AFAICT: Request URL: https://www-mustakbil-com.webpkgcache.com/doc/-/s/www.mustakbil.com/ Request Method: GET Status Code: 200 (from disk cache) Remote Address: 142.251.46.193:443 Referrer Policy: strict-origin-when-cross-origin accept-ranges: bytes access-control-allow-origin: * age: 48 alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000 cache-control: private, max-age=86400 content-encoding: gzip content-length: 17108 content-security-policy: require-trusted-types-for 'script'; report-uri https://csp.withgoogle.com/csp/webpkgcache-team content-type: application/signed-exchange;v=b3 cross-origin-opener-policy: same-origin; report-to="webpkgcache-team" cross-origin-resource-policy: cross-origin date: Wed, 19 Apr 2023 16:40:55 GMT expires: Wed, 19 Apr 2023 16:40:55 GMT last-modified: Wed, 19 Apr 2023 15:56:29 GMT nel: {"report_to":"nel","max_age":604800,"success_fraction":0.05} report-to: {"group":"nel","max_age":604800,"endpoints":[{"url":"https://beacons.gcp.gvt2.com/nel/upload-nel"},{"url":"https://beacons.gvt2.com/nel/upload-nel"}]} report-to: {"group":"webpkgcache-team","max_age":2592000,"endpoints":[{"url":"https://csp.withgoogle.com/csp/report-to/webpkgcache-team"}]} server: sffe vary: Accept-Encoding x-content-type-options: nosniff x-xss-protection: 0

naveedahmed1 commented 1 year ago

Looks like your home page reverted back to normal HTML.

Can you please elaborate what do you mean by reverted back to normal HTML?

It still has cache-control: max-age=86400?

Checking through SXG Validator Chrome Plugin (https://github.com/google/sxg-validator) it still says that its a valid SXG.

banaag commented 1 year ago

Oh sorry, my mistake. Yes it's still referencing an SXG:

a href="https://www.mustakbil.com/" jscontroller="M9mgyc" jsname="qOiK6e" jsaction="rcuQ6b:npT2md" data-sxg-url="https://www-mustakbil-com.webpkgcache.com/doc/-/s/www.mustakbil.com/"

And yes, I also see that everything is green per sxg-validator.

naveedahmed1 commented 1 year ago

Thank you for the update!

It's quite confusing. With the same headers, it didn't work at that time and suddenly started working.

At the moment its seems to be pointing to the correct url, I will monitor it and let you know if the issue occurs again.

So, apparently its the s-maxage header which was causing the issue.

Can you please also guide how to use the RSA Public/Private Keys instead of EC keys to update SXG cache through API?

The below document describes the use of EC keys: https://github.com/google/webpackager/blob/main/docs/update_cache_api.md

I tried using EC keys, but it doesn't seem to work, I have posted details here:

https://stackoverflow.com/questions/76066975/ecdsa-in-net-core-throwing-invalid-public-key-signature-error

banaag commented 1 year ago

Glad that it's working!

For the update cache API, could you confirm that you performed this portion of the instructions?

Place your public key in your website’s ./well-known directory. Google servers will fetch the public key from this location when it’s time to verify your request. Google will also keep a copy of your public key in its cache with a 24 hour expiration date. If your website is "https://www.example.com" then the URL for your public key should be: “https://www.example.com/.well-known/sxg-update-publickey.pem".

naveedahmed1 commented 1 year ago

Yes, and its available here https://www.mustakbil.com/.well-known/sxg-update-publickey.pem

naveedahmed1 commented 1 year ago

I think I got it working, but not sure. I was creating a signature with defaults, which in case of .Net Core uses IEEE P1363 format but since openssl dgst returns an ECDSA signature in ASN.1/DER, I should have used this format.

Now, as per docs https://github.com/google/webpackager/blob/main/docs/update_cache_api.md#output--errors

All responses will be of type application/json containing the following fields:

success: true or false message: A description if the success is false. For example: “The SXG URL is not found in the cache.”, or “Invalid URL signature, using public key ”. success: true is returned only for 202 responses. So the client may avoid parsing JSON upon seeing a 202 response. All other non-202 responses will return JSON response {success: false, message: “”}

But previously, I was receiving 200 status code with this response "{\"error\":true,\"reason\":\"Invalid public key signature. Update cache denied.\"}"

But now I still see 200 status code but without any json object, all I see is OK.

Does it mean its working?

banaag commented 1 year ago

If you see an OK response, it should be working.

naveedahmed1 commented 1 year ago

Just wanted to confirm that SXG seems to be working fine for different urls of the site.

My conclusion is that, it was the s-maxage header which was causing the issue.

I also want to thank you for your time and following this issue and providing your valuable suggestions.

naveedahmed1 commented 10 months ago

@banaag just following up on this issue. The issue seems to persists for our website and we have now finally decided to disable the SXG for our website.

I wanted to share one more thing and wanted to have your opinion on this.

We use Cloudflare for our website and Cloudflare periodically renews the certifictes of the websites, in our case it seems it renews it every two months. On Oct 20 we received email from Cloudflare informing "Cloudflare has observed issuance of the following certificate for mustakbil.com](https://www.mustakbil.com/) or one of its subdomains"

And around the sames dates we observed the same issue described in this issue.

I wanted to know should the issuance of the new certificate cause this issue especially if we have contents with long expiry?

In one of your previous messages you have mentioned that when you were investigating the issue, you found that the certificate is invalid.