ulixee / hero

The web browser built for scraping
MIT License
649 stars 32 forks source link

Improve Docs for Customizing Headers #219

Open rjbks opened 1 year ago

rjbks commented 1 year ago

Where can we specify or customize header selection/construction? I see the emulator-data folder and I assume something uses that to construct the headers based on the custom user agent syntax but I can't seem to find references to that data.

blakebyrnes commented 1 year ago

All of the logic is in the unblocked project, but it's currently mostly just a recording of various resource types requested from various origin scenarios (same origin, cross origin, origin none, etc). You can customize in the unblocked plug-in "beforeHttpRequest". The default browser emulator has an example of doing so at unblocked/plugins/default-browser-emulator/index.ts.

Let me know if this helps!

rjbks commented 1 year ago

Very helpful, thanks! Where can I find valid UA browser/version variants and their respective npm packages? For example, I know from the docs that if I want ~ chrome = 101 to work, I need to yarn add @ulixee/chrome-101-0. But if I want ~ safari, ~ edge or ~ firefox I can't seem to just run yarn add @ulixee/{browser}-{version}.

blakebyrnes commented 1 year ago

Ok, np. We unfortunately don't support other browsers at the moment. We intended to, but even just supporting safari proved to be very challenging, as the uiwebkit that you can automate is pretty divergent from safari in the dom.

rjbks commented 1 year ago

I'd like to get a better understanding of all the moving parts, as the docs aren't very detailed. I would be willing to help add to the documentation, but... bit of a chicken/egg situation

blakebyrnes commented 1 year ago

We've very much had a moving target :) A lot of the underlying engine is in a project called ulixee/unblocked. It's separated mostly because there's a huge amount of code that goes into the cycle of emulation. You might find that unblocked repo answers some of your questions. That said, documentation additions are welcome. We have a community on discord who can also answer questions (I'm on there too)

blakebyrnes commented 1 year ago

Is there any "Todo" for this issue?

rjbks commented 1 year ago

Is there any "Todo" for this issue?

For adding to the docs, what would you like to prioritize?

blakebyrnes commented 1 year ago

I'm not actually sure where to start. Are you feeling like you wanted an architecture page to dive into? Or more details on the "concepts"? I'm assuming you poked around the docs at ulixee.org? Were there specific parts you found lacking?