simonrob / email-oauth2-proxy

An IMAP/POP/SMTP proxy that transparently adds OAuth 2.0 authentication for email clients that don't support this method.
Apache License 2.0
797 stars 86 forks source link

configure Python packaging #199

Closed terencehonles closed 10 months ago

terencehonles commented 11 months ago

This change adds a pyproject.toml in order to allow pushing to package registries such as PyPI to allow installing the proxy via pip or another Python package manager.

terencehonles commented 11 months ago

I'm opening this as a draft because there are a few open issues with this PR:

Thoughts? If going this direction is acceptable I can continue with the PR otherwise I just figured I'd generate a minimal package config and ask via a PR.

simonrob commented 11 months ago

Thanks for submitting this draft. Releasing as a package is something I've considered a few times before, but ultimately decided against because I don't currently see that the benefits outweigh the drawbacks. But obviously you see differently or you wouldn't have put the work into this, so maybe you can convince me?

From what I see, there is one benefit from packaging: installation is much easier (i.e., pip install emailproxy or similar).

However, it seems like there are more than a few challenges:

All this being said, I initially thought setting up PyInstaller builds would be a similar endeavour, but this turned out to be relatively workable after a few teething problems, and now I just make it clear that these builds aren't officially supported. So, I guess the question I'm interested in is: what are the key benefits from your perspective that make this worth doing?

terencehonles commented 11 months ago

Thanks for submitting this draft. Releasing as a package is something I've considered a few times before, but ultimately decided against because I don't currently see that the benefits outweigh the drawbacks. But obviously you see differently or you wouldn't have put the work into this, so maybe you can convince me?

Well I don't have a very strong opinion one way or another for this package, but I can reason with you.

Before providing counterpoints to the points you listed, I'd just start with that Python tools are generally packaged and distributed through PyPI so it's effectively the de facto standard. I also don't see this as being an EITHER OR situation. One can download the Git repository and hack with the tool that way or download PyPI otherwise. To allow consistent interfaces between the two you can document the two different supported install methods:

# PyPI
pip install $PACKAGE_NAME[gui]

# Git
git clone $REPO && cd $REPO
pip install -e .[gui]

With the Git option someone does not need to use the editable install option of pip, but if they are planning on making changes to the files they should (and if they are going down the Git route they probably want that). As long as the script name is the same as the executable, and the script is effectively python -m $PACKAGE_NAME then someone can still just run python -m $PACKAGE_NAME or python $PACKAGE_NAME after installing the dependencies manually (if they really don't want to use one of the forms above). That's where it gets "a bit harder", since it doesn't look like you can say pip install --only-deps .[gui], but you could easily include a script which reads pyproject.toml and generates the requirements.txt files for the specific combination, and if you wanted to not rely on Python being able to read toml (Python stdlib 3.11+) you could just run that script ahead of time and include the requirements.txt files in the repo (but this still gives you a single source of truth). I'm not sure how committed you are to "this should be a standalone script and they should not have to install this package to either their system or a venv", but these are workarounds I see.

From what I see, there is one benefit from packaging: installation is much easier (i.e., pip install emailproxy or similar).

Agreed, and mentioned above.

However, it seems like there are more than a few challenges:

  • Configuration is less easy/obvious. There are all sorts of potential issues here, such as how to get the starter configuration, where to put it, and potential macOS file access challenges. You do make a good suggestion about having a command to generate an example configuration file, though.

The benefit of having it generated with a command over the Git version that exists now is that you can generate as many "starter" configurations as you want. Right now, while it's not hard, you can move the edited config to a new filename and then "re-checkout" that file. For experienced Git users I'm sure this is obvious, but newer users this may not be. This also leads to a challenge where if you edit the existing starter config you're always at risk of accidentally pushing your config in a PR. Since you included encryption :muscle: this will help for many of the secrets, but still makes it easier to fumble with a PR.

I think the "where to put it" you have is fine, but you could also check/support ~/.config/<package-name>.config for the installed package case (I actually thought it was current working directory, but I see not). I think the question for that would just be how many proxies do you expect someone to start. Since you can handle multiple SMTP/POP/IMAP servers per proxy I'd expect the default user only starts one proxy, and if they really want to start more than one they can specify the config via the flag (rather than trying to support a different flag like --profile and loading ~/.config/<package-name/<profile>.config). If you're starting with a command, you can just print out where the file is being written to (or make the user specify). To easily support a new config location you could just rename the existing config to "example.config" in the repo and if "emailproxy.config" exists then you know it's an existing instance or someone already asked and configured the proxy (or at least tried). If that file doesn't exist at the install path and it's not a frozen install you use the ~/.config location otherwise use the frozen location you have now. If you'd like to deprecate the frozen install location you could still check for it, but print out a warning, but never write a new config to that location. If you'd like to allow non package installed versions to use the current path, but installed packages to use the ~/.config location you could detect if the package is installed or not and switch between the two (I wouldn't suggest writing changes into the package so you'd need to at least detect if it's installed if you wanted to keep the current path. Looking at __package__ is not None will allow you to do that).

Do you mind expanding on the MacOS file access challenges? I'm not sure what those might be or what you've encountered in the past for this project.

  • Debugging is harder, because the proxy operates in a much more opaque manner, and users obtain the proxy as a packaged item with no real visibility into its workings, and a bit more distance from the source repository. I don't know whether macOS permissions would be an issue here too, but it's a possibility.

I'm not sure this is true. I mean yes you're not cloning from the repo, but the PyPI page would have the repo listed, and I still expect people might find the repo first, and install via PyPI because that's what directions they chose to follow. If you want to make changes to the package, you'll want to re-install it in editable mode, but the Python stack traces will show the path to wherever the package is installed and nothing is stopping you from editing the file, even in the installed state. Sure, that's not a great practice to update files where they are installed, but I've done that "under duress" (and I couldn't or didn't want to re-install the specific tool from a source package).

I think this is a weak argument, and prioritizes hackers and Python developers over someone who is just trying to use the tool. The experienced users will know how to put their system in a state that prioritizes development and hacking, and if you provide instructions on how to install in editable mode I think you cover the case where the tool user is also a newer developer who wants to learn to be the experienced user or hacker.

I believe I would agree to this argument in a packaged state, but I also believe the packaged state (depending on how you've done it) just means the installer has its own Python and it's installed there. So the same argument about the stack traces printing the right location remains and you can still edit things in place, but moving to an editable install is harder here and you're possibly going to be discouraging commits back to the repo.

  • If the proxy were intended to be used as a dependency for other packages then releasing it in this way would make a lot of sense. Although PyPi can be used to distribute standalone applications (e.g., like maestral), something like Homebrew seems to fit a bit more closely. But I suppose there's room for both.

Yes, I agree, but I'm not sure how likely that is to happen, and as you mention it can be used to distribute standalone applications (or even non Python tools to Python developers).

  • It's one more thing to deal with when making a release, though here again it could use an action just like the PyInstaller builds. I wouldn't want to have to change release naming, so would need some mapping to numbered versioning.

Yes, I would expect you'd want a GitHub action or some release automation to push the change to PyPI so you didn't have to deal with it. Packaging a project as module does not require you to release on PyPI since I could also install pip install emailproxy@git+https://github.com/simonrob/email-oauth2-proxy, but not publishing on PyPI seems like a missed opportunity if you have everything as a package already.

Those are my thoughts in response to yours.

I'll add one thing to the list of things you're thinking about (I've tried to hint about it whenever I have shown install commands): Python package extras can't remove requirements from a package, so the default would need to be the minimal install. This means that you'd need to "by default" use the [gui] extra if you expected people to be running by default in GUI mode. I don't see adding [gui] to the install instructions to be a deal-breaker, but I wanted to make that clear.

Finally, I'd like to say thanks for developing this tool. It's been helpful with some changes my web host sprung up on me (not offering mail hosting anymore but switching to a trial of Office 365). We were already looking at switching hosts and this is pushing us to do so, so I'll likely not be using this tool in the long term, but thanks are in order nonetheless!

simonrob commented 11 months ago

Thanks for the detailed follow-up here – I really appreciate this. And I'm certainly not fundamentally against packaging in any way – I maintain a few other things that are already on PyPI, for example. I'm also fine with switching the installation instructions to point to just using pip with either emailproxy or emailproxy[gui] for the two variants.

In terms of commitment to a standalone script, I am pretty committed to the proxy being able to be used as a portable, standalone single-file utility – I find this really useful in several scenarios (though definitely nothing against using a venv at the same time). But I don't think that this rules out packaging as an extra option.

Re: macOS, I've no idea whether there would actually be file permission issues, but the proxy can run into this sort of thing at the moment when being run as a service (see the troubleshooting section of the readme). Perhaps this is a non-issue because it's Python that requires permission, and that is going to be the case regardless.

I think there are a few key issues to resolve before going forward:

Thanks for the kind words by the way – I'm really glad this tool is useful for other people.

terencehonles commented 11 months ago

In terms of commitment to a standalone script, I am pretty committed to the proxy being able to be used as a portable, standalone single-file utility – I find this really useful in several scenarios (though definitely nothing against using a venv at the same time). But I don't think that this rules out packaging as an extra option.

Yeah, that's reasonable, and they don't have to be mutually exclusive.

Re: macOS, I've no idea whether there would actually be file permission issues, but the proxy can run into this sort of thing at the moment when being run as a service (see the troubleshooting section of the readme). Perhaps this is a non-issue because it's Python that requires permission, and that is going to be the case regardless.

I see, you definitely have done a good job documenting these things :+1:. This does look it would not change anything since this is a permission you to grant to the executable as required by the OS.

I think there are a few key issues to resolve before going forward:

  • Deciding on how to pick the configuration file location. I'd want to make sure any changes here didn't break existing installations on upgrade. Probably the easiest approach is to simply do nothing and keep the existing behaviour: default to the current working directory, but allow a config file location to be specified using a parameter (which lets you keep config files safely outside of the working directory when developing).

So I think a Git rename would probably track and move an existing config (but I've not tested it), so that probably removes the option of renaming emailproxy.config to example.config, but data files do not need to be packaged side by side with the script (or I'm pretty sure you can at least change the directory they are packaged in, if not the name itself). So we can either look at if the package is installed and/or if there's an emailproxy.config side by side with the main script. If there's one side by side we can assume the end user is either using one of: existing use cases or an editable install. Both of these it's fine to update the config. If the config file does not exist then the app should not start up and suggest the config needs to be generated. That puts us at the final decision, should the installed app write the config to the current working directory or ~/.config (or that version for the OS)?

Does this sound acceptable? Thoughts on it? I can put this in its own PR if you wanted to iterate on it.

  • Settling on a good way to provide a new parameter that generates the sample configuration file. If the example can easily be packaged with the script itself then it's just a case of duplicating it. If not, then it could perhaps be downloaded directly from this repository.

I think it would be best and easiest to have it packaged and then just effectively copy it outside the package when requested. I can open a separate PR for this.

Yup, I can configure the action if you'd like and you can just set up the secrets in order to publish. One thing I noticed is that your packaged app version is on its own branch. Did you want to keep it that way? I would think it might be best to have a single branch and then on tag/release creation the release actions are run. I didn't look at the workflow file to see if that's the trigger you used, but that's what I'd suggest where this goes.

  • I would want a single source for dependencies. Perhaps a GitHub action could generate requirements.txt files from pyproject.toml on each release and commit these if anything has changed? I've no appetite to increase the minimum required Python version for the proxy, but that wouldn't be an issue here because it'd only be required in the build process. In fact, could pyproject.toml just reference the two existing requirements.txt files, rather than duplicating their contents?

I'm not sure if the pyproject.toml can reference a requirements file, and it may very well be able to, but it sounds like you want to keep the requirements files? Are you trying to support not installing with pip -e .[gui] (so pip install -r requirements.txt && python emailproxy.py or is pip -e .[gui] && python emailproxy.py ok)?

  • How to solve the version name/number issue. I have nothing against semantic versioning, but just don't feel it fits this particular use-case. The proxy is not going to be radically updated at any point because it is essentially complete – there's just not much left to actually do. So: how can ISO 8601 dates be used in PyPI packages?

Unfortunately it's the -s that are the problem. This is the requirements https://peps.python.org/pep-0440/#public-version-identifiers, but the __version__ you have defined in the file is not a standard and can be whatever you want it to be. It just would complicate the packaging config, but the value can be translated from the - form to the . form by the packaging system.

simonrob commented 11 months ago

I had a look at other examples, and it turns out that several well-known projects use dated versions on PyPI (e.g., 2023.7.22, etc). Even pip itself has versioning based at least partly on date. If pyproject.toml can just convert directly from - to . (and drop leading zeros) then that sounds fine to me.

Since I think we're getting towards a way to make this work, I tried out your pyproject.toml. python -m build doesn't work for me for some reason (Python 3.11; No module named build.__main__; 'build' is a package and cannot be directly executed). Using python -m pip wheel --no-deps . instead does produce a wheel file. However, when I install this wheel in a fresh venv, I get No module named emailproxy when running it (python -m emailproxy). The same happens with pip -e /local/dev/path. I've only used the setup.py build process previously, so (as you can probably tell), I'm not very familiar with pyproject.toml. But I checked the wheel file, and it doesn't actually contain the emailprox.py script. What's going wrong here?

terencehonles commented 11 months ago

I had a look at other examples, and it turns out that several well-known projects use dated versions on PyPI (e.g., 2023.7.22, etc). Even pip itself has versioning based at least partly on date. If pyproject.toml can just convert directly from - to . (and drop leading zeros) then that sounds fine to me.

It's non trivial, but it can be done. Those projects store the version in that format. A "hack" would be to store an attribute in the module that does the transformation, and since it looks like setuptools first tries to statically read the attribute, but then actually load the module if it has to, then that's going to be the easiest way to expose the version in a dotted form.

Since I think we're getting towards a way to make this work, I tried out your pyproject.toml. python -m build doesn't work for me for some reason (Python 3.11; No module named build.__main__; 'build' is a package and cannot be directly executed).

If you want to make sure some legacy install methods work this PR would need to include a minimal setup.py https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html#id1. I'm not familiar with executing a build module, and historically I used python setup.py sdist, but I see the module in https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#source-distributions, but it looks like you're missing the build dependency as mentioned in https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#packaging-your-project (and may not need the setup.py)

Using python -m pip wheel --no-deps . instead does produce a wheel file. However, when I install this wheel in a fresh venv, I get No module named emailproxy when running it (python -m emailproxy). The same happens with pip -e /local/dev/path. I've only used the setup.py build process previously, so (as you can probably tell), I'm not very familiar with pyproject.toml. But I checked the wheel file, and it doesn't actually contain the emailprox.py script. What's going wrong here?

I wonder if it has anything to do with the version of pip you're using? The wheel is a zip, as you probably know, and when I build the wheel the same way you did I see the script itself:

unzip -l emailproxy-7.dev7+gbb24962-py3-none-any.whl 
Archive:  emailproxy-7.dev7+gbb24962-py3-none-any.whl
  Length      Date    Time    Name
---------  ---------- -----   ----
   174132  2023-10-15 10:36   emailproxy.py
    11357  2023-10-15 11:36   emailproxy-7.dev7+gbb24962.dist-info/LICENSE
    46152  2023-10-15 11:36   emailproxy-7.dev7+gbb24962.dist-info/METADATA
       92  2023-10-15 11:36   emailproxy-7.dev7+gbb24962.dist-info/WHEEL
       46  2023-10-15 11:36   emailproxy-7.dev7+gbb24962.dist-info/entry_points.txt
       11  2023-10-15 11:36   emailproxy-7.dev7+gbb24962.dist-info/top_level.txt
      633  2023-10-15 11:36   emailproxy-7.dev7+gbb24962.dist-info/RECORD
---------                     -------
   232423                     7 files

For me the script was the first entry, and the entrypoint is included in emailproxy-7.dev7+gbb24962.dist-info/entry_points.txt, which should install emailproxy in the venv's "bin" directory, and the module should be installed in the "lib//site-packages".

Any chance you were not running in the venv? For the editable install there should be very little installed in the site-packages but a pointer back to the install directory so that local modifications are picked up immediately. I believe the entry points are written once, and any updates to the pyproject.toml would require re-installing the package.

terencehonles commented 11 months ago

It's non trivial, but it can be done. Those projects store the version in that format. A "hack" would be to store an attribute in the module that does the transformation, and since it looks like setuptools first tries to statically read the attribute, but then actually load the module if it has to, then that's going to be the easiest way to expose the version in a dotted form.

I implemented the "non trivial" solution for setup.py and then there's the suggestion which is what the hack would look like. You could let me know which you'd prefer.

simonrob commented 11 months ago

I found out why the script wasn't being included: I'd added [tool.setuptools.packages.find] and used its exclude capability to ignore an existing folder in my local development branch. For some strange reason, this also excluded the script, despite no use of wildcards and no similarity in name. Anyway – removing this solved the issue, so there was nothing wrong on your side. Sorry for the confusion!

Thanks also for your thoroughness here – it is really appreciated. I think I prefer keeping the version in the main script. Both approaches feel a bit hacky to me for something that would have been straightforward with the old setup.py approach, but it looks like pyproject.toml is the future, so we'll stick with that.

The only other thing I spotted is that this approach means the script's standard dependencies are required when building. I adapted an existing mode check to detect when being built, so only the three key additional requirements are needed. I'm fine with this tradeoff.

I've just pushed this change along with autoformatting the pyproject.toml file to match the rest of the project's style. If this works from your end, I think that this PR is probably complete, and the next step is to configure auto-building, then update the readme, and finally remove the requirements.txt files.

To answer your question about why building is handled in a separate branch – I know this is unconventional, but I just found it easier to keep this non-core feature separate and initiate a build by merging into that branch on release. I'm happy to rename that branch (e.g., build) and just add a new GitHub action to create and release the package to PyPI.

simonrob commented 11 months ago

One more thing I just thought of: how should the plugins branch be handled with this new approach? I suppose the two options are to either:

a) Merge it into the main branch. This feature was originally experimental, hence being kept separate, but is now stable. It's perhaps not of interest to very many users, but conversely it probably gets overlooked in the current setup. b) Point to installing via pip install -e as while the script is different, the requirements are the same.

simonrob commented 11 months ago

What do you think about how to deal with the plugins branch?

terencehonles commented 11 months ago

What do you think about how to deal with the plugins branch?

I saw the comment, just hadn't had a chance to look at the plugins yet. Are they a namespace module or just neighbors to the script? My thought without looking at how you did things would be to declare a sub module as a namespace module, and allow plugins to install themselves there. Plugins can also mark themselves as plugins if we were to copy how pytest does it's plugin registration (it has both automatic and plugins you need to register yourself), but that's quite possibly overkill.

I think we can continue to think about the plugins before this PR is merged, but can address things like #201 or making sure there's a good way to generate a new config before moving to answer that.

simonrob commented 11 months ago

Currently there's a module folder of example plugins. They are referenced by name when adding to an account, and imported automatically.

If we assume nobody ever writes their own plugins then this branch (which is now very stable) could just be merged and packaged as-is. I don't know whether that's the case, though. Otherwise it's probably best to keep the existing requirements.txt installation approach here because plugins are a pretty advanced use-case.

simonrob commented 10 months ago

After a little more investigation I've decided to resolve the plugins branch issue by keeping things as-is with the installation in that configuration, which means it's best to simply refer to the existing requirements.txt files in the new pyproject.toml (added in 742b97c).

Things I think still remain to do:

simonrob commented 10 months ago

I think this is now ready to go – I've created the PyPI version, and will add auto-building when I next get chance.