Add local_fetch_only Function to Restrict External Network Access in WeasyPrint
Description
This PR introduces a custom URL fetcher function, local_fetch_only, for WeasyPrint. The purpose of this function is to prevent any external network access during the fetching process. It allows only local file paths, base64 encoded data, and relative URLs. All other URLs, including HTTP, HTTPS, FTP, and IP addresses, are blocked. Previously, external calls were observed for things like CSS files and such. This should be restricted.
Implementation
The local_fetch_only function is designed to:
Allow Base64 Encoded Data: URLs with the data scheme.
Allow Local File Paths: URLs with the file scheme.
Allow Relative URLs: URLs without a scheme.
For all other URL schemes (e.g., http, https, ftp), the function returns an empty response, effectively blocking the request.
Code
from urllib.parse import urlparse
from weasyprint import default_url_fetcher
def local_fetch_only(url, *args, **kwargs):
"""
Custom URL fetcher for WeasyPrint that prevents any external network access.
This function allows only local file paths, base64 encoded data, and relative URLs. It blocks all other URLs,
including HTTP, HTTPS, FTP, and IP addresses, ensuring that no external network access occurs during the fetching
process.
Args:
url (str): The URL to fetch.
*args: Additional positional arguments.
**kwargs: Additional keyword arguments.
Returns:
dict: A dictionary containing an empty string for 'string', 'text/plain' for 'mime_type', and 'utf8' for 'encoding'
if the URL is blocked. Otherwise, it uses the default fetcher for local resources.
"""
parsed_url = urlparse(url)
# Allow base64 encoded data, local file paths, or relative URLs
if parsed_url.scheme in ('data', 'file', ''):
return default_url_fetcher(url, *args, **kwargs)
# Block all other URLs (http, https, ftp, IP addresses, etc.)
return {
'string': '',
'mime_type': 'text/plain',
'encoding': 'utf8'
}
Reasoning
The primary motivation for this implementation is security. By blocking external network requests, we ensure that WeasyPrint cannot inadvertently leak data or fetch resources from untrusted sources. This will prevent some resources from loading but ultimately is safer.
Control
Allowing only local file paths and base64 encoded data provides fine-grained control over the resources that can be accessed. Relative URLs are permitted to ensure that internal resources can still be referenced without specifying the full URL.
Describe testing procedures
An additional test with an eml file was created to test the retrieval of external (fake) resources. This test will produce a thumbnail of an image without additional tags
Sample output
If this change modifies Strelka's output, then please include a sample of the output here.
Checklist
[x] My code follows the style guidelines of this project
[x] I have performed a self-review of and tested my code
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
Add
local_fetch_only
Function to Restrict External Network Access in WeasyPrintDescription
This PR introduces a custom URL fetcher function,
local_fetch_only
, for WeasyPrint. The purpose of this function is to prevent any external network access during the fetching process. It allows only local file paths, base64 encoded data, and relative URLs. All other URLs, including HTTP, HTTPS, FTP, and IP addresses, are blocked. Previously, external calls were observed for things like CSS files and such. This should be restricted.Implementation
The
local_fetch_only
function is designed to:data
scheme.file
scheme.For all other URL schemes (e.g.,
http
,https
,ftp
), the function returns an empty response, effectively blocking the request.Code
Reasoning
The primary motivation for this implementation is security. By blocking external network requests, we ensure that WeasyPrint cannot inadvertently leak data or fetch resources from untrusted sources. This will prevent some resources from loading but ultimately is safer.
Control
Allowing only local file paths and base64 encoded data provides fine-grained control over the resources that can be accessed. Relative URLs are permitted to ensure that internal resources can still be referenced without specifying the full URL.
Use Cases
Base64 Data URL
Local File URL
Relative URL
Describe testing procedures An additional test with an
eml
file was created to test the retrieval of external (fake) resources. This test will produce a thumbnail of an image without additional tagsSample output If this change modifies Strelka's output, then please include a sample of the output here.
Checklist