simple-repository / simple-repository-server

A tool for running a PEP-503 simple Python package repository, including features such as dist metadata (PEP-658) and JSON API (PEP-691)
MIT License
20 stars 1 forks source link

HttpRepository.__init__() got an unexpected keyword argument 'session' #1

Closed bazzargh closed 3 months ago

bazzargh commented 4 months ago

The last update to simple-repository changed the arguments for HttpRepository, and this library isn't synched up:

; docker run -it --rm  simple https://pypi.org/simple
INFO:     Started server process [1]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/home/simple/venv/lib/python3.11/site-packages/starlette/routing.py", line 732, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/simple/venv/lib/python3.11/site-packages/simple_repository_server/__main__.py", line 73, in lifespan
    repo = create_repository(repository_urls, session, database)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/simple/venv/lib/python3.11/site-packages/simple_repository_server/__main__.py", line 49, in create_repository
    repo = HttpRepository(
           ^^^^^^^^^^^^^^^
TypeError: HttpRepository.__init__() got an unexpected keyword argument 'session'

I'm running this with my own Dockerfile I threw together to test it:

FROM python:3.11 as base
RUN useradd --create-home simple
WORKDIR /home/simple/

FROM base as build

# create and activate virtual environment
# using final folder name to avoid path issues with packages
USER simple
RUN rm -rf /home/simple/venv; python3 -m venv /home/simple/venv --copies
ENV VIRTUAL_ENV=/home/simple/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
COPY --chown=simple . .

# install requirements
RUN --mount=type=cache,mode=0755,target=/root/.cache pip3 install uv
RUN --mount=type=cache,mode=0755,target=/root/.cache uv pip install .

FROM base as runtime
USER simple
COPY --from=build /home/simple/venv /home/simple/venv
EXPOSE 5000

# make sure all messages always reach console
ENV PYTHONUNBUFFERED=1

# activate virtual environment
ENV VIRTUAL_ENV=/home/simple/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

ENTRYPOINT ["simple-repository-server", "--port", "5000"]

And then just docker build -t simple .. I'm building this way because I wanted to play with patching the server a bit, like to add etag support on PEP-503 index files (this has been a problem for us with artifactory), and the warehouse json api (which dependabot uses instead of PEP-658, PEP-691 even when talking to private registries), and maybe http range header support (which uv uses)

Anyway, the solution if this bites you is in setup.py change "simple-repository", to "simple-repository<0.5.0",

bazzargh commented 4 months ago

Just some more notes about what I'm up to (since you asked in the README):

We currently use a private repo which proxies to pypi as well as holding our internal packages. That repo got noticably slower after an upgrade and we traced it back to https://github.com/pypa/pip/blob/038dc755061d6918288eae97fbec363df3607366/src/pip/_internal/index/collector.py#L141-L154 - pip will only ever use etags for caching index files for simple repositories, if you don't serve them the indexes are re-downloaded every time. With multiplatform builds and dozens of dependencies, this adds hundreds of seconds to a build; apparently our server stopped sending the etags. simple-repository-server doesn't send them either, but it looks like an easy fix.

Investigating this led us to other problems with our repo: the lack of PEP-658 metadata means that tools have to use fallback solutions to do dependency resolution. uv for example uses http range headers to grab the relevant metadata if it's not present as a separate path. Our server... supports neither PEP-658 or Http Range headers.

The final straw is dependabot: it uses the warehouse json api https://github.com/dependabot/dependabot-core/blob/main/python/lib/dependabot/python/update_checker.rb#L267-L276 to fetch metadata instead of PEP-658, no matter what, so fails on every private registry except warehouse (AFAIK). If I can't get them to fix that code, I need to fix my server.

The code here is nice and simple and it looks easy to add these features; it's unclear to me how best to set simple-repository-server up with a local repo that supports authenticated uploads, but it's trivial to set it up with multiple remote backends, and to serve a PEP-503 only simple repository straight out of s3 which this proxy can upgrade on the fly to support metadata and the other protocols.

The dockerfile is just a hack. Our instance would run in EKS, but I've not tried setting it up yet.

eddybl commented 3 months ago

Is there another fix coming/possible other than downgrading simple-repository to below 0.5.0?

bazzargh commented 3 months ago

You could try to implement whatever the change was that didn't get uploaded.

I've a good guess at what the missing code dump included; one possible reason for switching to httpx from aiohttp is this bug https://github.com/simple-repository/simple-repository-server/issues/4#issuecomment-2131213240 - that if you talk to an upstream with gzip enabled the response size won't match the content-length header. httpx contains iter_raw, so you could stream the original response. https://www.python-httpx.org/quickstart/#streaming-responses

I didn't pursue that as I don't think it's correct either - if the client didn't say it would accept gzip, we shouldn't stream that. Stripping out unwanted headers seemed easier. In the setup I use I have nginx caching wrapped around simple-repository-server configured to serve stale responses while fetching and to fetch in the background ignoring client disconnects, this gives me fast responses without streaming (at the expense of the cache not being up to date all the time; not a big concern as tagged wheels should be immutable)

eddybl commented 3 months ago

But to put it more simply: the current version of the simple-repository-server is just broken and cannot be used at all since simple-repository was updated 0.5.0?

It is not like I am doing something wrong?

bazzargh commented 3 months ago

you're correct.

pelson commented 3 months ago

There is some really interesting content in this issue/thread, thanks for the background @bazzargh. I've just released v0.5.0 which should address the actual issue at hand - please let me know if not though.

eddybl commented 3 months ago

So I can at least say that the server is now properly starting up ;)

pelson commented 3 months ago

Should be fully resolved by v0.6.0.

The feedback on the usage is highly valuable, so I've made a new label to track it.