Dedicated group_data directory breaks the relative paths and imports

Describe the bug

At first, my directory structure was as follows:

.
├── deploys
│   ├── deploy1.py
│   ├── deploy2.py
│   └── deploy3.py
│   └── templates
│       └── template.j2
├── inventory
│   └── main.py
└── libs
    └── __init__.py
. . .
# Example usage
pyinfra inventory/main.py --limit test.local deploys/deploy1.py

I didn't utilize group_data at all since there was no need in that. Later, a new inventory was added, in the same directory. A bit of context: an inventory is an abstraction that is above groups of hosts; logically enough, it maps perfectly with a datacenter. It could also map with an "environment", like production, dev, infra or whatever, but we utilize groups at that level of abstraction for now. Will probably go with dedicated inventories for every environment in the future though. So, consequentially we realized that we could use group_data feature, because obviously there was some data that is different between datacenters. But since we have groups with the same names in both inventories (like "postgres", "monitoring", etc), we can't go with one top-level group_data folder. So the layout has morphed into this:

.
├── deploys
│   ├── deploy1.py
│   ├── deploy2.py
│   └── deploy3.py
│   └── templates
│       └── template.j2
├── inventories
│   ├── dc1
│   │   ├── group_data
│   │   │   └── all.py
│   │   └── main.py
│   └── dc2
│       ├── group_data
│       │   └── all.py
│       └── main.py
└── libs
    └── __init__.py
# Example usage
pyinfra inventories/dc1/main.py --limit test.dc1 deploys/deploy1.py

Two more things to mention:

deploys from deploys/ directory import classes that are declared in libs/ directory module, like this: from libs import my_class
deploys from deploys/ directory reference templates from deploys/templates using relative paths, like this: src='deploys/templates/template.js'

Everything works fine until you create dedicated group_data directory for an inventory. After that, pyinfra decides to change a value of state.deploy_dir to the path of the inventory directory here - https://github.com/Fizzadar/pyinfra/blob/current/pyinfra_cli/main.py#L319 This changes paths for relative imports here - https://github.com/Fizzadar/pyinfra/blob/current/pyinfra_cli/main.py#L325 as well as relative paths for templates and other stuff (I didn't research where this happens in particular). So:

if I want to continue to rely on relative paths for templates, I have to move deploys/templates into a directory of whatever inventory I use at the moment so pyinfra could find them; same with local.include('tasks/my_task.py'), probably other operations
I have to move libs directory into a directory of whatever inventory I use at the moment so Python could find my custom defined classes Both are unacceptable.

For templates and other paths to files, I simply hammered everything with os.getcwd(), so there's no relative paths anymore in my project. This is very awkward, but I can live with that. But the way it broke Python imports is just too much.

I understand the probable reasoning behind this logic. Still, it represents one of the many ways the layout can be constructed, and the end users shouldn't be forced to move everything inside their inventory folder.

To Reproduce

Move your inventory file into a subfolder
Create a group_data directory alongside your inventory file
Try to run a deploy that references a file by a relative path (like files.template(src='deploy/templates/template.j2', local.include('tasks/my_task.py'), or from custom_libs import custom_class)

Expected behavior

Current working directory (from which pyinfra gets called) is at least in sys.path, so Python could find modules from the top level directory. Would be great to also find way to fix relative paths to files situation, but I didn't yet do a proper research where exactly they are implemented.

Meta

Include output of pyinfra --support.

--> Support information:

If you are having issues with pyinfra or wish to make feature requests, please
check out the GitHub issues at https://github.com/Fizzadar/pyinfra/issues .
When adding an issue, be sure to include the following:

System: Linux
  Platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
  Release: 5.4.0-89-generic
  Machine: x86_64
pyinfra: v1.4.18
Executable: /opt/.venv/bin/pyinfra
Python: 3.8.10 (CPython, GCC 9.3.0

How was pyinfra installed (source/pip)? Virtualenv/pip.
Include pyinfra-debug.log (if one was created) None was created.

Consider including output with -vv and --debug.

--> An unexpected exception occurred in: deploys/deploy1.py:

File "/opt/.venv/lib/python3.8/site-packages/pyinfra_cli/util.py", line 80, in exec_file
exec(PYTHON_CODES[filename], data)
File "deploys/deploy1.py", line 3, in <module>
from libs import custom_class
ModuleNotFoundError: No module named 'libs'

Looks like it can be reproduced with config.py file instead of group_data directory, too.

The sole purpose of this issue was to use it in the commit message https://github.com/Fizzadar/pyinfra/pull/699 But it looks like a further discussion is needed.

Thank you for writing this up @glassbeads - I totally agree that this is both confusing and inflexible currently; would like to completely rework how the paths work. I think the whole idea of picking a single deploy directory is unrealistic. I think there's three things to resolve here:

Given an inventory filename, where do we search for group data?
Given a deploy filename, where do we search for files/templates/imports?
Given both an inventory+deploy filenames, where do we search for the config file (default filename config.py)? And what path do we add to the Python import path?

IMO there's no reason this need to live in the same directory and should be evaluated independently. Current WIP thoughts below.

Inventory <> group data

For a given inventory path (eg /some/random/path/inventory.py), look in:

/some/random/path/group_data/*.py - this would fix the issue above
$DEPLOY_FILENAME/group_data/*.py - this would behave most like the current

Additionally, add a --group-data-folder flag to override this logic. Is it a good idea to also look in the CWD? Or is this just more confusing?

Deploy <> files/templates/imports - more complex, there's a few variants here:

files.template(src='templates/my-template.j2', ...)

No indicator - is this relative to this file or to CWD or something else? This could also be an included file, in which case maybe this is relative to the parent file? Currently thinking this should lookup in:

$DEPLOY_FILENAME/templates/my-template.j2 - filename = the top level deploy file as passed in via CLI

Then there's more explicit variants, which I think are simpler to define:

# Relative to *this* file where the operation is called
files.template(src='./templates/my-template.j2', ...)
files.template(src='../templates/my-template.j2', ...)

# Absolute path, left as-is
files.template(src='/templates/my-template.j2', ...)

Config file & Python path

Currently uses the "magic"(mess) where we look at deploy and inventory files. But we can have multiple deploy files and inventory files needn't live alongside any of the deploy code, so it's not possible to cleanly pick a path here. Suggestion - just use CWD so:

Lookup config at $DEPLOY_FILENAME/config.py
Fallback config from $CWD/config.py
Add $CWD to the Python sys.path

Some rough thoughts, I want to get this to a point where it's super easy to explain why things do and don't import and this can be easily added to the documentation. I want to avoid things like the Ansible documentation where there's a confusing set of rules and combinations.

I've edited the above post a few times but think I've condensed things into a few simple (or not? :)) rules:

The "deploy directory" is always set to the directory of the deploy file currently being executed
- ie /opt if pyinfra INVENTORY /opt/deploy.py
- if we're executing an operation or fact directly, use CWD
"Magic paths" (not starting ., ../ or /) are relative to the deploy directory
- non-magic/explicit paths are always relative to the file they are defined in
Config is loaded relative to the deploy directory
The group_data will be looked up relative to both inventory file and deploy directory; first one found "wins"
- ie /opt/group_data if pyinfra INVENTORY /opt/deploy.py
- or /opt/inventories/group_data OR /opt/group_data if pyinfra /opt/inventories/production.py /opt/deploy.py
- Both the deploy directory, and the CWD, are always added to the Python path

As far as I can tell this satisfies fully the original problem whilst being mostly compatible with the current way things work. There is one more situation that this doesn't account for, that the current (confusing) logic does: executing a task file normally imported, ie:

pyinfra INVENTORY /opt/tasks/install.py
Currently, as long as a /opt/config.py exists, the deploy directory = /opt
With the above setup, deploy directory = /opt/tasks, things like config get missed

Simplified again:

No "deploy dir", everything relative to the CWD
Operations with local path arguments:
- always relative to the CWD (so files/blah = CWD/files/blah)
- absolute paths left as-is (/somewhere/else)
group_data` is discovered in the following order, multiple occurrences are merged, existing keys overwritten by last:
- alongside the inventory file ./group_data
- in the CWD/group_data

In addition:

add a --chdir argument to switch CWD before executing

This keeps things MUCH simpler and should be mostly compatible with existing pyinfra code (that I'm aware of), in addition to solving the original question. The only major difference would be calling pyinfra outside of the current "deploy directory", which the --chdir flag would be a replacement for.

v2 is now live which loads up any group_data directory next to the inventory file, which fixes this issue!

Also adds a --group-data flag to add additional directories.

pyinfra-dev / pyinfra