google / webpackager

Apache License 2.0
71 stars 20 forks source link

URL doesn't match the fetch targets #95

Open juangodPerlego opened 3 years ago

juangodPerlego commented 3 years ago

Hello, I am getting the following issue when sending a request to the webpkgserver:

2021/09/17 16:29:24 Listening at [::]:80 2021/09/17 16:29:24 Successfully retrieved valid OCSP. 2021/09/17 16:29:26 processing https://www.perlego.com/book/1690290/criminal-law-pdf ... 2021/09/17 16:29:26 error with processing https://www.perlego.com/book/1690290/criminal-law-pdf: fetch: URL doesn't match the fetch targets

This is the webpkgserver.toml file I'm using:

[Listen] Port = 80

[SXG.Cert] PEMFile = '/www_perlego_com.pem' KeyFile = '/server.key' AllowTestCert = false

[SXG] CertURLBase = 'https://perlego.com/'

[[Sign]] Domain = 'perlego.com'

This is how I'm sending the request:

wget -v -d --header="Accept: application/signed-exchange;v=b3" localhost/priv/doc/https://www.perlego.com/book/1690290/criminal-law-pdf

I'm confused since from reading the source code, I gather that the error URL doesn't match the fetch targets is thrown when the Domain field of the server configuration does not match the actual host of the request, but in this case I believe it does.

Thanks in advance, Juan

twifkak commented 3 years ago

Hi! The Domain field needs to match the domain exactly -- subdomains (www.) will not match. If you want to match both perlego.com and www.perlego.com, you will need two [[Sign]] sections.

The here's the code that sets up a matcher, and here's an example Hostname() usage showing that it includes all components.

Let me know if that works.

Also, if you have a use for wildcard subdomains (e.g. a domain like glitch.me that has potentially infinite), we could add support.

aliafshany commented 2 years ago

did you manage to fetch and extract PDF files from Perlgo?

best