Clarify use cases with examples

ehmatthes commented 1 year ago

Hi, I was at the lightning talk you gave at PyCon last week. I was the one helping pick up the handouts at the end of the day. :)

I just read through 5069; there's way more history and backstory to this project than I imagined. Maybe you mentioned it, but I think you should have shown that issue in the lightning talk!

The problem this project tries to solve is not one I currently face, but I've taught Python for a long time now, and I can relate to the issues you're trying to deal with in general. Teaching people how to install Python packages when you don't control their development environment is not straightforward.

My first thought really echoes what I read here. Watching your talk, it was interesting to see a pip install happen from inside a Python terminal session. I tried it as well, and it worked for me. But then I was left with a whole bunch of questions and thoughts that don't seem to be clearly answered here. They all come down to this one:

What are the intended bounds of this project?
- You can already carry out an install in a terminal session, and it nicely reports that something has already been installed if that's the case.
- If that's the entire intended use case, I think you should state that clearly.
- I was left wondering what happens if someone then jumps to writing .py files. Are imports supposed to work there as well? If they aren't familiar with using pip, does this project teach people to open a REPL, use pipster, then go to their .py file and use import?
- Are you intending for people to use pipster in a .py file? If so, I would want a much clearer writeup of that usage. Yes, it's clear I can use install() in a .py file. But do you then need an import statement? TBH, if that's the case, that seems like a pretty cluttered and unpythonic workflow. It seems like the effort spent teaching people how to use that workflow would be better spent helping them make sense of the Python environment they have on their own system.

I'm really careful about giving critical feedback on a project that someone has clearly put a lot of work and thought into. But, you also deserve honest feedback. If this project is going to be useful, I would suggest it needs a much clearer articulation of the use cases and problems it solves, and clear examples of the intended usage. I don't think it needs more code at the moment.

This may be a project where digging in this far ultimately shows that the existing usage is difficult for a reason. I've written wrappers before, and it's always tempting to expose more of the underlying tool's API, and reimplement more and more of the tool's functionality. That's a sign you're either trying to do too much, or haven't scoped the project clearly enough.

If it's not clear from what I said in person, I appreciate the work you have done, and even more so after reading the pip issue thread.

webknjaz commented 1 year ago

I know I told @reynoldsnlp about pip-api on the day of the Education Summit, but I'll also link it here https://github.com/di/pip-api so that it's not lost. The pip-api project seems to have a more generic/wider scope, though, and it looks like it doesn't expose the installation API. It's also packaged for conda-forge and has a more comprehensive test matrix. pip-api started about 9 months prior to pipster: Mar 20, 2018 (https://github.com/di/pip-api/commit/296a965a8542d105a79272f2be892ea31afe90f0 vs. Dec 28, 2018 (https://github.com/reynoldsnlp/pipster/commit/35f3cf010c30c1e5650ebe2eb5d5d64291762409). So they were being developed almost in parallel with focus on similar but slightly different pip API aspects.

I'll tag @di to see if he'd be open to collaboration — no need to duplicate the efforts anymore...

di commented 1 year ago

On first glance it looks like these two projects, while similar, have different goals. The goal of pip-api is to provide a drop-in replacement for existing usages of pip's internal API, and this project doesn't seem to be focused on that goal.

It's good to know this exists though! Many people come to pip-api looking for something more like this project.

reynoldsnlp commented 1 year ago

@ehmatthes I'm not sure how I missed this issue when you posted. Thanks for the thoughtful response! It is very helpful to have a new perspective to help clarify both the goals and the presentation of the project. I feel like the project still has merit, but I haven't done a good job of presenting HOW to use it.

What are the intended bounds of this project?

I think @ncoghlan 's original intent was to give access to all of pip's functionality from inside a REPL or script. The install command is the most problematic in this context (and the most likely use case), so I started with that. My interest in the project is to give learners a python-centric solution when they do not have any/enough experience with shells to effectively troubleshoot pip. When your shell skills are inadequate to troubleshoot problems, pipster is fool-proof and sufficient.

Are you intending for people to use pipster in a .py file?

That is one intended use case. For example, I could imagine giving students a separate install_dependencies.py or a script header like this:

# The following two pipster lines only need to be run once. Comment them out after running the script the first time.
import pipster
pipster.install('pandas')

import pandas as pd
...

If this project is going to be useful, I would suggest it needs a much clearer articulation of the use cases and problems it solves, and clear examples of the intended usage.

This is my main takeaway from your feedback, and I'm glad you articulated it so well. I'm going to change the name of the issue to match.

ehmatthes commented 1 year ago

I'm not sure how I missed this issue when you posted.

I'm glad to know you just missed it, I wasn't sure if I had said something the wrong way.

# The following two pipster lines only need to be run once.
# Comment them out after running the script the first time.

You have described this project as "Pythonic". Having people include lines once and then remove them does not strike me as Pythonic. For one thing, it makes the code less portable. From the user's perspective, as soon as they're in a different environment, their code will no longer work. This really gets at the need to understand the packaging system to some degree. It very quickly becomes a circular issue.

I would still prefer to see some variation of the following:

from pipster import pipster_version_of_import
pipster_version_of_import pandas as pd

Behind the scenes, pipster_version_of_import would import the package if it's already installed, and install it if not. That seems much more Pythonic to me.

reynoldsnlp commented 1 year ago

@ehmatthes I agree that the header solution with commented code is the least pythonic way to do it. Perhaps a simple modification would fix that though: make install() check to see if the package is already installed before attempting to install it (unless upgrade=True is given).

I have played around with other ways of triggering the install. So far, the only stable approach is using pipster._experimental.autoinstall.autoinstall(). It works by reading __file__ to identify imports and checks to see if they are already installed. If not, it installs them. For modules whose name is different from the package name, a comment beginning with # install is used to specify the package name to install:

from pipster._experimental.autoinstall import autoinstall
autoinstall()

from bs4 import BeautifulSoup  # install beautifulsoup4
import requests
import sklearn  # install scikit-learn

I avoided adding this to the README because I am playing the political game of wanting PyPA to adopt this into pip, and this felt a little too clever. However, I think it is more pythonic and user-friendly. How do you like this approach compared to the one you suggested?

reynoldsnlp commented 1 year ago

autoinstall() also still needs a little polishing, e.g. #13 .

ehmatthes commented 1 year ago

I'm working on a project called django-simple-deploy, which automates deployment of Django projects to the platform of your choice. One of the steps in configuring a project for deployment is adding packages that might not be needed locally, but are required on the target platform. It was interesting to work through the question of "How do I detect which packages a user has already installed?" So I get where you're coming from in trying to come up with a way to specify installation package names vs import names.

I am playing the political game of wanting PyPA to adopt this into pip

That's a good goal, and I think you're right to be open about that if that's a possible end goal. There are others who can speak more directly to this, but I would guess there's a lot that has to take place as an external tool before people will consider adopting this tool into pip. I think that's true of any comparable tool as well, that's nothing specific to this project.

Pip is one of the most actively used and installed of all Python packages. Maybe it's the most installed and used Python package? So I don't think there's any design decisions you could make that would cause the pip maintainers to just bring this project into pip. It would be pretty wild to bring an untested tool into a package that will instantly see millions of downloads on a daily basis. I think you need to make some usage decisions, state them clearly with examples, implement those decisions, and share your tool. I can't imagine any tool that attempts to solve these issues would be adopted into pip until it's already being used widely, and has been tested by (probably millions of) end users.

webknjaz commented 1 year ago

PyPA to adopt this into pip

PyPA doesn't directly influence what pip does o doesn't. It's entirely up to the pip maintainers.

reynoldsnlp / pipster

Clarify use cases with examples #45