ValueError if `citation_pdf_url` is blank

Page https://www.wjgnet.com/2220-3230/full/v11/i4/88.htm contains this snippet:

    <meta name="citation_pdf_url" content="" />

But this builds an empty httpx.URL, which then raises because the script will try to fetch /robots.txt for this empty URL. Temp fix:

@@ -378,6 +394,9 @@ def _fulltext_urls_from_meta(data: bytes) -> tuple[httpx.URL, str] | None:
         if field not in meta_dict:
             continue
         for fulltext_url in meta_dict[field]:
+            # Skip blank URLs
+            if fulltext_url.strip() == "":
+                continue
             return httpx.URL(fulltext_url), filetype

For PDFs, this seems like the right answer but for, e.g., HTML, the 'right' thing is probably the URL of the page itself. On the other hand, that seems like a very rare case, since HTML is not our priority anyway.

rafguns / doidownloader

ValueError if `citation_pdf_url` is blank #19