ulixee / hero

The web browser built for scraping
MIT License
657 stars 32 forks source link

Bug: LinkedIn Profile page results in HTTP2_STATUS_invalid (999 code) #158

Open GlenDC opened 2 years ago

GlenDC commented 2 years ago

e.g. https://www.linkedin.com/in/satyanadella/

Something like https://www.linkedin.com does work.

blakebyrnes commented 2 years ago

Looks like LinkedIn is actually returning a 999 status code, which regular Chrome will still load if there's a response body. Automated Chrome throws a network failure and redirects to a chrome error page. Very weird behavior. I wonder which part of Chrome is intercepting and handling this where automation is not?

GlenDC commented 2 years ago

yes indeed. I wonder as well. LinkedIn is no target of me, but noticed it in some random test.

blakebyrnes commented 2 years ago

Actually, it looks like this is a nodejs issue. Digging through their source code, there's no way I can find to send a status code that isn't per-http spec. They reject > 599. Simple solution is to modify statuses above 599 to be an acceptable value... but that is definitely detectable if you were to send an XHR request from the page and check the status code.

GlenDC commented 2 years ago

Hmm okay, I suppose can close this one as a won't fix in that case?

blakebyrnes commented 2 years ago

I think we still need to fix this... just not sure the best way as long as nodejs is our proxy. We could go down a level to write headers directly onto the socket (although this is more complex in http2), we could simply modify down to 599 for now, or we can hold off for the Chrome net proxy :)

GlenDC commented 2 years ago

Chrome net proxy

You mean a proxy based on the chromiun network stack? Is that a plan you're already have in the make then, or just future dreams?

blakebyrnes commented 2 years ago

Yes, I do mean using the Chromium network stack. That's what we ultimately want to do, but it's just a future plan at the moment.

GlenDC commented 2 years ago

Exciting prospects :) I would say for now perhaps better hold it off, but we can keep the issue open in that case, at least that's my opinion.