pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.58k stars 963 forks source link

Add architecture explanation to application structure overview #2794

Open brainwane opened 6 years ago

brainwane commented 6 years ago

Let's update our application structure overview with a writeup like Zulip's architecture summary or a curated list of links to conference talks, blog posts, etc. that would get us 30% of the way towards a history and application overview like this MediaWiki overview. We'd mention frameworks and components we use, like:

and engineering approaches we recommend people know about as they learn Warehouse.

Reasoning: Developers who are new to a codebase need is to know the design rationale of confusing bits -- why it was made this way, what decisions are embedded in particular choices, whether particular components are the result of a feature request, a quick fix after an outage, an experiment, etc. (This is based on research summarized in Making Software.)

Discussed a bit on the pypa-dev mailing list.

brainwane commented 6 years ago

@lgh2 just chatted with Ernest and got some notes that I'll be turning into a PR. Here are those notes for reference -- they are very rough because I requested very quick notes, so that's my fault, not hers:

The Warehouse codebase

Warehouse uses the Pyramid web framework, the SQLAlchemy ORM, and Postgres for its database. Warehouse's front end uses Jinja2 templates.

The application exists within two Docker containers, one of which contains static files for the website, and the other which contains the Python web application code running in a virtual environment and the database. In the development environment, Docker Compose manages running the containers and the connections between them.

The top-level directory of the Warehouse repo contains a number of files. Among them are the license file, contributing.rst and readme. The requirements.txt file is for the Warehouse virtual environment. The Dockerfile creates the Docker containers that Warehouse runs in, and the docker-compose yml file configures docker compose. Test configuration is in setup.cfg. Heroku uses runtime.txt. The makefile contains commands to spin up Docker compose and the Docker containers. There are also some files associated with Warehouse's front end.

# add files

Since Warehouse was built on top of a pre-existing database, some of the code in the ORM may not look like code from SQLAlchemy’s documentation in order to make it fit the existing tables. There are some places where joins are done using logic instead of a foreign key.

Warehouse also uses Pyramid’s hybrid URL traversal and dispatch. Using factory classes, URLs are pre-populated before the view is requested.

bin/ - high-level scripts for Docker
dev/ - assets for dev env
tests/ - tests
warehouse/ - code in modules
    legacy/ - most of the implementation
    forklift/ - APIs for upload
    accounts/ - user accounts
    admin/ - administrator-specific
    cache/ - Warehouse - more goes out than goes in - cache as much as possible
    classifiers/ - frame classifiers
    cli/ - entry scripts
    i18n/ - internationalization
    locales/ - internationalization
    manage/ - DB
    migrations/ - DB
    packaging/ - models
                - rate limiting to prevent abuse
                - RSS feeds
                - site maps
    utils/

Pyramid hybrid URL Traversal and Dispatch:

https://docs.pylonsproject.org/projects/pyramid/en/latest/narr/hybrid.html

Pyramid: https://docs.pylonsproject.org/projects/pyramid/en/latest/index.html SQLAlchemy: https://docs.sqlalchemy.org/en/latest/ Postgres: https://www.postgresql.org/docs/

Docker: https://docs.docker.com/

Docker Compose: https://docs.docker.com/compose/overview/

brainwane commented 6 years ago

It would be great if this documentation also explained what files/directories/libraries Warehouse uses to produce its various APIs.

di commented 6 years ago

@brainwane Could you outline what was missing from https://github.com/pypa/warehouse/pull/2937 that would fully resolve this issue?

brainwane commented 6 years ago

Thanks for asking @di. I'd like the Warehouse developer documentation to include:

brainwane commented 6 years ago

In today's Warehouse developers' meeting we decided to pare down our near-future milestones on our development roadmap so they really only contain the essential bugfixes and features we need to launch, replace legacy PyPI, and shut down the old site. So I'm moving this issue into a milestone further in the future.

alanbato commented 6 years ago

While talking with @brainwane on the IRC, I came up with two ideas:

I think a Glossary regarding terms like "project, distribution, maintainer" could be helpful to clear confusions between similar concepts and synonims found both in the codebase and the docs. e.g. project, distribution, package, version, author, maintainer, etc.

Also, I think it would be valuable to include architecture beyond the codebase, and include things like design preferences for tests, how the docker containers are setup right now, descriptions with detail of what each make command does, and other "development" parts of the workflow for completeness. Adding things besides the code layout that are also part of the system. :)

We should be careful (I almost made the mistake myself) with mixing contribution guidelines with the system architecture, design choices and codebase information.

ewdurbin commented 6 years ago

I'm reconsidering the directory layout specifying what each subdirectory concerns itself with as that is almost guaranteed to change over time and become out of date.

The Glossary might provide enough context on what the module names mean, and a basic primer on Pyramid app/module layout would probably suffice.

rixx commented 6 years ago

Just my unqualified 2 cents as a first-time user of warehouse: For me, the directory structure and the "assumptions and concepts" block were the most helpful parts of the documentation once I was set up and trying to get my bearings, because it was helpful in figuring out where to start exploring.