Closed cooperlees closed 1 year ago
My suggestions here:
Write out 3 files index.html
, index.v1_html
, and index.v1_json
. These will map to:
Ext | Content Type |
---|---|
.html |
text/html |
.v1_html |
application/vnd.pypi.simple.v1+html |
.v1_json |
application/vnd.pypi.simple.v1+json |
For Apache, if you have mod_negotiation
enabled you can use a .htaccess
that looks like this inside of the /simple/
directory:
Options -Indexes +Multiviews
DirectoryIndex index
AddType application/vnd.pypi.simple.v1+json v1_json
AddType application/vnd.pypi.simple.v1+html v1_html
This will:
index.html
to just index
, which will let the MultiViews
look up the correct file extension.You can use this in a Docker container using the httpd
docker container, but it requires modifying the built in config to enable mod_negotiation
and set it to read .htaccess
files. A Dockerfile
that implements that would look like:
FROM httpd
RUN echo '\n\
LoadModule negotiation_module modules/mod_negotiation.so\n\
\n\
<Directory "/usr/local/apache2/htdocs">\n\
AllowOverride All\n\
</Directory>' >> /usr/local/apache2/conf/httpd.conf
This can be ran using docker run --rm -dit -p 8080:80 -v PATHTOBANDERWEB:/usr/local/apach2/htdocs/ theimagebuiltabove
, with the .htaccess
added.
Alternatively, you can use nginx. The adapted banderx config looks something like this:
daemon off;
user nginx;
worker_processes auto;
error_log /dev/stderr info;
pid /run/nginx.pid;
events {
worker_connections 2048;
}
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /dev/stdout main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 69;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
map $http_accept $mirror_suffix {
default ".html";
"~*application/vnd\.pypi\.simple\.latest\+json" ".v1_json";
"~*application/vnd\.pypi\.simple\.latest\+html" ".v1_html";
"~*application/vnd\.pypi\.simple\.v1\+json" ".v1_json";
"~*application/vnd\.pypi\.simple\.v1\+html" ".v1_html";
"~*text/html" ".html";
}
map $arg_format $mirror_suffix_via_url {
"application/vnd.pypi.simple.latest+json" ".v1_json";
"application/vnd.pypi.simple.latest+html" ".v1_html";
"application/vnd.pypi.simple.v1+json" ".v1_json";
"application/vnd.pypi.simple.v1+html" ".v1_html";
"text/html" ".html";
}
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name banderx;
root /data/pypi/web;
autoindex on;
charset utf-8;
location /simple/ {
# Uncomment to support hash_index = true bandersnatch mirrors
# rewrite ^/simple/([^/])([^/]*)/$ /simple/$1/$1$2/ last;
# rewrite ^/simple/([^/])([^/]*)/([^/]+)$/ /simple/$1/$1$2/$3 last;
index index$mirror_suffix_via_url index$mirror_suffix;
types {
application/vnd.pypi.simple.v1+json v1_json;
application/vnd.pypi.simple.v1+html v1_html;
text/html html;
}
# Uncomment to support conneg for files other than
# index, so that /simple/foo will map to /simple/foo.html,
# /simple/foo.v1_html, or /simple/foo.v1_json based on the
# Accept header.
# try_files $uri$mirror_suffix $uri $uri/ =404;
}
# Let us set the correct mime type for all the JSON
location /json/ {
default_type application/json;
}
location /pypi/ {
default_type application/json;
}
error_page 404 /404.html;
location = /40x.html {
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
}
}
}
The big differences between Apache and Nginx here are:
Accept
header and select the correct content type based on that.
;q=N
parameter to indicate relative preference.$mirror_suffix
variable by doing regex testing against the Accept
header, with a default fallback to .html
.
q=N
parameter for clients to express their preference of which content types they prefer, out of the ones they support. This is allowed under conneg, Servers are not required to return the content type the client most prefers, but it's nice if they do since the client presumably has a reason to prefer it.;q=0
typically disables the content type, but since the nginx config doesn't actually parse/understand the Accept
header, it will ignore that qvalue as well. Using q=0
is pretty rare, so I don't think it's a particularly big deal.latest
aliases for our custom content types, Apache does not because Apache's conneg doesn't let us return a different content type than gets matched in the Accept
header, while Nginx does.
mod_rewrite
or something, I'm not sure.default
in the map (in the above case, it's .html
).
text/html
and the PEP 691-ified pip asks for all 3.mod_rewrite
that would let you set a default that would be used when there isn't an Accept
header, I'm not sure.?format=
query parameter, which will override the Accept
header if it's been specified.
mod_rewrite
, I'm not sure.Personally, I would recommend sticking with nginx for banderx
.
I don't think the fact the Nginx's conneg support is not really actually implemented as conneg, but instead some basic regex matching will actually matter for anyone unless they're purposely trying to do weird things, but I think the ability to specifically pick which version is the default is a really nice thing as it lets a mirror operator decide what level of compatibility they want (my above config chooses max compatability) and I think that the extra features supported by the nginx config (latest
version, the ?format=
url parameter) are nice to have as well.
On the other hand, I think that Apache's behavior of defaulting to whatever response is smallest is nice for saving bandwidth, but I think it's kind of weird that different URLs under /simple/
may end up with randomly different default options.
One additional thing:
The above assumes that bandersnatch is going to swap out from writing just index.html
files, to writing the 3 files mentioned above alongside each other, which makes a lot of sense for people who want a single URL to support all of the content types available.
Some people may want to not rely on conneg, and have different URLs for different content types. I think bandersnatch could support this pretty easily using two options:
/data/pypi/web/simple/pkgname/
, if this option was turned on you would do /data/pypi/web/simple/html/pkgname/
, /data/pypi/web/simple/v1+json/pkgname/
, etc.
pip install -i https://example.com/simple/v1+json/
.
Add logic for bandersnatch to save both the HTML and JSON simple index files. This will allow people to serve both the HTML and JSON in their mirrors.
We should also update docs + give an example way to serve based on request headers (conneg) as outlined in PEP691.