py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax
https://py-econometrics.github.io/pyfixest/pyfixest.html
MIT License
117 stars 27 forks source link

docs: improve installation section #489

Closed baggiponte closed 3 weeks ago

baggiponte commented 3 weeks ago

Ciao! Just discovered this project and it seems so cool! A huge throwback to my econometrics days (though I had to use Stata ๐Ÿฅถ).

I propose to change the installation section to be more explicit about installing the package in a virtual environment. Furthermore, as a Python core developer Brett Cannon recommends, I prepended python -m in front of pip, for reasons better explained in his post here.

Feedback is appreciated!

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

see 29 files with indirect coverage changes

s3alfisc commented 3 weeks ago

Thanks for the PR @baggiponte! I have to admit that the inner workings of python environments are still a little bit of a mystery to me. Though this looks good to me and I am happy to merge it. =) Btw, having first learned R, I have successfully refused using Stata until I had no other choice ๐Ÿ˜„Still quite good software, actually.

s3alfisc commented 3 weeks ago

@all-contributors please add @baggiponte for doc

allcontributors[bot] commented 3 weeks ago

@s3alfisc

I've put up a pull request to add @baggiponte! :tada:

baggiponte commented 3 weeks ago

Thanks for the PR @baggiponte! I have to admit that the inner workings of python environments are still a little bit of a mystery to me.

Happy to help whenever you need! ๐Ÿ˜Š

Btw, having first learned R, I have successfully refused using Stata until I had no other choice ๐Ÿ˜„Still quite good software, actually.

Me too! Fortunately I just had to go through Stata tutorials (not for my MSc thesis, which was on ML). But indeed, good piece of software and in hindsight it made sense for its usecase.

Anyway, thanks! And great work for the lib.

leostimpfle commented 3 weeks ago

Agree with @baggiponte that it's best to work in virtual environments but to me the current docs are not super clear as to why python -m should be used and how it relates to the actual installation with pip. An inexperienced user may simply type python -m pip install without actually activating an appropriate virtual environment.

I'm wondering if separating installation via pip and managing virtual environments should be highlighted as two separate issues. At the very least, it would be nice to link to the concept of virtual environment (e.g., the official docs).

For example, the docs forpandas clearly state that installation from PyPI is one option and that if the user were to choose this option they should consider using pip inside a virtual environment: https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html#installing-from-pypi

I might be overthinking this but Python's dependency management is notoriously messy, so it can't hurt to be as clear as possible. Let me know your thoughts @s3alfisc @baggiponte .

s3alfisc commented 3 weeks ago

Thanks for chiming in on this @leostimpfle. I had also wondered if we should try to be more explicit, but then checked the statsmodels readme, and thought that following their example should be good enough =) But admittedly, I am also worried that inexperienced users might struggle with installation (e.g. econ students like me who've never gotten an intro to python package management ๐Ÿ˜…).

Should we maybe adjust the installation instructions to

You can install the release version from PyPI by running

pip install pyfixest
# inside an active virtual environment (recommended)
python -m pip install pyfixest

? This would be very specific and close in spirit to pandas.

s3alfisc commented 3 weeks ago

For example, I like what they are doing here. Active encouragement to create a virtual environment with venv + they show how to do it + installation instructions for installing in the virtualenv.

baggiponte commented 3 weeks ago

Ciao @leostimpfle, and thanks for the comment. I am not sure I understand your point 100%.

I think we are all on the same page about this, but for the sake of clarity let me paraphrase once again what Brett wrote. python -m pip executes the pip that "comes with" the Python interpreter (aka version) you specified as python. If you run pip install, the shell looks for the first command named pip and executes it - which might not be the pip executable you wish for. This works (and should be used) regardless of whether you are in an active venv.

Are you suggesting we should add this explanation in the docs/README? Sure, we can leave the link so that someone interested in the explanation can dig deeper.

Are you suggesting the docs should say to install pyfixest in a venv? Also +1 on my side.

s3alfisc commented 3 weeks ago

Hi - sorry for the broken link above. Here's a corrected one=)

My suggestion would be to have something like this

You can install the release version from PyPI by running

pip install pyfixest

We recommend setting up a custom environment for your project via venv and to install pyfixest in a virtual environment

python3 -m venv pyfixest_venv              # create a virtual environment via venv
source pyfixest_venv/bin/activate          # activate the virtual environment
pip install pyfixest                                  # pip install pyfixest

What do you both think? @baggiponte @leostimpfle

baggiponte commented 3 weeks ago

I would go with this:

python -m pip install pyfixest

We recommend setting up a custom environment for your project via venv and to install pyfixest in a virtual environment

python -m venv .venv       # create a virtual environment via venv
source .venv/bin/activate    # activate the virtual environment
python -m pip install pyfixest                               

Because of the following reasons:

  1. python -m blabla is recommended regardless whether you are in a venv or not
  2. You are free to name a .venv in whichever way you prefer, but it's a common practice to name it .venv so it's hidden in your project directory. Most tools (including gitignore template generators) are configured to deal with this pattern by default. Technically, you can define the venv "display name" with python -m venv --prompt "whatever" .venv.

Just a recommendation though ๐Ÿ˜Š