posit-dev / rsconnect-python

Command line interface for publishing to Posit Connect
https://docs.posit.co/rsconnect-python/
GNU General Public License v2.0
28 stars 21 forks source link

Unable to specify the installation of the current directory, which causes a ModuleNotFoundError #456

Open timotk opened 1 year ago

timotk commented 1 year ago

I am trying to deploy a Streamlit dashboard to Connect, but get an error.

Setup

I have the following directory structure:

dashboard
`-- app.py         # imports from my_awesome_package
src
|-- my_awesome_package
|    |-- data.py
|    `-- views.py
requirements.txt   # Specify requirements for my_awesome_package
pyproject.toml     # Specify installation of my_awesome_package

Deployment

Deployment happens in two steps:

  1. rsconnect write-manifest streamlit --entrypoint=dashboard/app.py .
  2. rsconnect deploy manifest manifest.json --title "My Dashboard"

The Error

Once deployed, I open the application and get the following error:

ModuleNotFoundError: No module named 'my_awesome_package'

The issue

I believe the issue is as follows: The connect server does not install packages from the local directory (through setup.py or pyproject.toml.), but only what is in requirements. In the manifest.json, the files from my_awesome_package are specified, but they cannot be found because the package has not been installed.

Question

How can I specify through rsconnect that the current directory should also be installed? Could this be an update to the manifest, or is there another way of doing it?

timotk commented 1 year ago

Specify . in requirements.txt

One option is to specify . in the requirements.txt, installing the current directory:

.
pandas==2.0.3
streamlit==1.25.0

However, this conflicts with my pyproject.toml, which specifies dependencies dynamically:

[project]
name = "my-awesome-package"
version = "0.0.0"
dynamic = ["dependencies"]

[tool.setuptools.dynamic]
dependencies = { file = ["requirements.txt"] }

The error now becomes:

setuptools.extern.packaging.requirements.InvalidRequirement: Expected package name at the start of dependency specifier
.
^
[end of output]

Workaround 1

A workaround can be to remove the dynamic dependencies from pyproject.toml and keeping . in requirements.txt:

[project]
name = "my-awesome-package"
version = "0.0.0"

Now the installation works. However, when changing your code and redeploying to connect, the cached environment is used.

Workaround 2

This builds on the workaround above: Specify -e . in requirements.txt. This makes the installation of the current directory editable (so changes are passed to the Connect server). It also allows cached environments to still be used, as long as your requirements don't change.

Drawback

The major drawback is that you cannot install your application including dependencies anymore with:

pip install .
# or
pip install -e .

It now requires a different (and less clear) installation:

pip install -r requirements.txt

Conclusion

Found a workaround. It works but isn't ideal. Would be awesome if Connect Server would support pyproject.toml, not just requirements.txt. But I guess this is not up to rsconnect-python!

mmarchetti commented 1 year ago

Thank you for the detailed writeup! I'm glad you were able to find a workaround. It can be tricky when the project includes both a package and a consumer of the package. As you've noted, Connect's environment caching will not build a new environment when the package changes, unless the reference to it in requirements.txt changes. The approach of using an editable installation might cause some side effects if another project uses the environment (because it has the same content in requirements.txt).

Another option would be to put the package directory in the same directory as the app and import from there, rather than listing it as a dependency. Or, if you have access to a local/private PyPI compatible repo, you can publish the package there and reference it as a dependency.