Open saarnav890 opened 1 year ago
Okay, so my initial prediction of the issue being the range header was correct. I wrote some middleware in order to ignore the range header and the scraping for the dynamic routes works great now.
If anyone else has this same problem just add a middleware.ts script into the root of the project and then what I did was add this code to essentially just delete the Range header. If you want to do something more complicated, you can check out https://nextjs.org/docs/advanced-features/middleware for more on middleware.
My code to ignore the range header:
import { NextRequest, NextResponse } from 'next/server'
export default function middleware(request: NextRequest) {
const headers = new Headers(request.headers);
headers.delete('Range');
const responseWithoutRange = NextResponse.next({request: {headers}});
return responseWithoutRange;
}
Edit: This seemed to make all my routes very slow compared to not using middleware at all so the workaround for this workaround is just by using a simple if check to make sure this is only being done if the request has the range header in the first place.
import { NextRequest, NextResponse } from "next/server";
export default function middleware(request: NextRequest) {
if (request.headers.has("Range")) {
const headers = new Headers(request.headers);
headers.delete("Range");
const responseWithoutRange = NextResponse.next({ request: { headers } });
return responseWithoutRange;
}
}
This is a workaround I found for now, but hopefully it gets officially fixed soon :)
This is more of a facebook crawler issue than a nextjs one, especially since the other tool worked just fine.
Moreover, if you look at the source of example, the OG tag is there, which means nextjs is doing its streaming tech as expected by browsers. For this reason I suspect nextjs will not be able to "fix" it sans introducing a facebook-specific check for its crawler. Which might or might not bode well, since crawlers generally don't like when they get fed special page versions for them.
For the slowness, by disabling Range
header you basically disable response streaming, the whole point of the app
directory structure. NextJS probably falls back to fully rendering the page in absence it, hence the perceived slow down.
Just for the reference, when you talk about "fb debugger", you only use it as an example because the actual prod setup doesn't show OG previews on facebook, right?
Is there any way to understand that request was made by a FB crawler and delete Range header only for those?
I tried to figure out what the heck, but failed. I am trying to get correct opengraph previews for inner pages of the website, but FB scraper gets totally messed up meta tags, when for example, Telegram processes the links just fine. For example, this link https://www.culturaweek.fi/fi/tapahtumat/konferenssi See the metadata and then check what FB is getting: https://developers.facebook.com/tools/debug/echo/?q=https%3A%2F%2Fwww.culturaweek.fi%2Ffi%2Ftapahtumat%2Fkonferenssi%2F
I experienced the same issue when Facebook was crawling the links on my site. I tried removing the "Range" header like @Culturalist suggested and it seemed to work, but again to avoid removing the header for all request make a check for whether its the facebook crawler or not.
Not totally sure that this check will catch all cases for the facebook crawler but with what I have tried, it works.
Here is the code I added to my middleware:
const headers = new Headers(req.headers);
if (
req.headers.get("User-Agent")?.includes("facebookexternalhit") &&
req.headers.has("Range")
) {
headers.delete("Range");
}
// ...
return NextResponse.next({
request: { headers },
});
Verify canary release
Provide environment information
Operating System: Platform: darwin Arch: arm64 Version: Darwin Kernel Version 22.2.0: Fri Nov 11 02:03:51 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T6000 Binaries: Node: 16.14.0 npm: 8.3.1 Yarn: 1.22.19 pnpm: N/A Relevant packages: next: 13.1.1-canary.1 eslint-config-next: N/A react: 18.2.0 react-dom: 18.2.0
Which area(s) of Next.js are affected? (leave empty if unsure)
Head component/file (next/head / head.js)
Link to the code that reproduces this issue
https://github.com/saarnav890/ogImageIssue
To Reproduce
To reproduce, try to run the FB sharing debugger, https://developers.facebook.com/tools/debug/?q=https%3A%2F%2Fog-image-issue.vercel.app%2F. (results in a good image)
Then, run the same debugger but with anything else as the /[slug], for instance, https://og-image-issue.vercel.app/something.
https://developers.facebook.com/tools/debug/?q=https%3A%2F%2Fog-image-issue.vercel.app%2Fsomething (results in a 500 internal server error)
Additionally, according to https://developers.facebook.com/docs/sharing/webmasters/crawler/ you can run
curl -v --compressed -H "Range: bytes=0-524288" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" https://og-image-issue.vercel.app/
to get the proper response,
but running
curl -v --compressed -H "Range: bytes=0-524288" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" https://og-image-issue.vercel.app/something
results in a 500 internal server error.
However, if you just remove the range header, the dynamic content works:
curl -v --compressed -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" https://og-image-issue.vercel.app/something
Describe the Bug
The FB crawler works perfectly fine when crawling statically generated pages. However, when trying to crawl dynamically generated pages, it gives a 500 internal server error.
This is the expected response for the index page:
This is the response for anything dynamically rendered:
Expected Behavior
The crawler should return the same image for both the index page and any slug page, however, this is not happening. When I try this with other meta tag simulators such as https://en.rakko.tools/tools/9/ both work perfectly fine. Because of this, I think it has something to do with the range header.
Which browser are you using? (if relevant)
Version 108.0.5359.124 (Official Build) (arm64)
How are you deploying your application? (if relevant)
Vercel