Closed philippta closed 9 months ago
@philippta can i work on this? thanks
btw the idea of structuring code in modules is quite interesting
may i know your references (e.g. other repos or articles) regarding that modules code structuring?
@rafiramadhana Yes, you can work on this. Few things to consider:
flyscrape.RequestBuilder would be the easiest way to support custom headers, however this hook is only called in the scraper and not for file downloads.
Instead you could use the flyscrape.TransportAdapter hook, which intercepts all requests, including file downloads.
Here is an example for a TransportAdapter: https://github.com/philippta/flyscrape/blob/94da9293f63e46712b0a890e1e0eab4153fdb3f9/modules/proxy/proxy.go#L48-L57
TransportAdapters are also a bit special though. They have to be applied in a specific order which are specified here. The bottom most adapter is applied first, which is what you want for headers. https://github.com/philippta/flyscrape/blob/eae10426cd805ecc0a0459b61639e48e6cd913ad/module.go#L94-L100
may i know your references (e.g. other repos or articles) regarding that modules code structuring?
It is similar to how Caddy Modules work, but way less elaborate.
The bottom most adapter is applied first, which is what you want for headers.
sry, i'm a bit confused by this sentence because there are bottom most and first in one sentence
do you mean,
"The bottom most adapter (the AdaptTransport
impl of headers
module) is applied first (the headers
module should be put at first in moduleOrder
), which is what you want for headers."
moduleOrder = []string{
// Transport adapters must be loaded in a specific order.
// All other modules can be loaded in any order.
"headers", // New `headers` module
"proxy",
"ratelimit",
"cache",
}
TL;DR The "headers" module should be last in the moduleOrder
list like so, but let me explain.
moduleOrder = []string{
"proxy",
"ratelimit",
"cache",
"headers", // New `headers` module
}
For reference:
type TransportAdapter interface {
AdaptTransport(http.RoundTripper) http.RoundTripper
}
The AdaptTransport
takes a http.RoundTripper
and returns a new (wrapped/adapted) http.RoundTripper
similar to how HTTP middlewares work in almost all Go routers/web-frameworks.
We can use the http.DefaultTransport
as a starting point and adapt it with more functionality like so:
myClient := http.Client{
Transport: moduleA.AdaptTransport(http.DefaultTransport),
}
If we had another module, we can adapt the already adapted transport.
myClient := http.Client{
Transport: moduleB.AdaptTransport(moduleA.AdaptTransport(http.DefaultTransport)),
}
We could do this infinitely further to add more and more adapters to the call chain.
Ultimately the request would the travel like this:
http.Client (sends request) -> moduleB -> moduleA -> http.DefaultTransport -> Internet
Now to, why the reverse order:
finaltransport := http.DefaultTransport
for _, mod := range allModules { // 1. proxy, 2. ratelimit, 3. cache, 4. headers
finaltransport = mod.AdaptTransport(finaltransport)
}
http.Client{
Transport: finaltransport, // headers(cache(ratelimit(proxy(http.DefaultTransport)))
}
The last module is going to be the outer most in the onion like call chain, which can mangle the HTTP request first.
Hope that makes sense 🙏
The last module is going to be the outer most in the onion like call chain, which can mangle the HTTP request first.
Hope that makes sense 🙏
i see, thanks for the pointers
Custom request headers should be supported as a
headers
config option. A newheaders
module should be created for this.Proposed example:
Ref: