pyinfra-dev / pyinfra

pyinfra turns Python code into shell commands and runs them on your servers. Execute ad-hoc commands and write declarative operations. Target SSH servers, local machine and Docker containers. Fast and scales from one server to thousands.
https://pyinfra.com
MIT License
3.85k stars 374 forks source link

Dedicated group_data directory breaks the relative paths and imports #698

Closed glassbeads closed 2 years ago

glassbeads commented 2 years ago

Describe the bug

At first, my directory structure was as follows:

.
├── deploys
│   ├── deploy1.py
│   ├── deploy2.py
│   └── deploy3.py
│   └── templates
│       └── template.j2
├── inventory
│   └── main.py
└── libs
    └── __init__.py
. . .
# Example usage
pyinfra inventory/main.py --limit test.local deploys/deploy1.py

I didn't utilize group_data at all since there was no need in that. Later, a new inventory was added, in the same directory. A bit of context: an inventory is an abstraction that is above groups of hosts; logically enough, it maps perfectly with a datacenter. It could also map with an "environment", like production, dev, infra or whatever, but we utilize groups at that level of abstraction for now. Will probably go with dedicated inventories for every environment in the future though. So, consequentially we realized that we could use group_data feature, because obviously there was some data that is different between datacenters. But since we have groups with the same names in both inventories (like "postgres", "monitoring", etc), we can't go with one top-level group_data folder. So the layout has morphed into this:

.
├── deploys
│   ├── deploy1.py
│   ├── deploy2.py
│   └── deploy3.py
│   └── templates
│       └── template.j2
├── inventories
│   ├── dc1
│   │   ├── group_data
│   │   │   └── all.py
│   │   └── main.py
│   └── dc2
│       ├── group_data
│       │   └── all.py
│       └── main.py
└── libs
    └── __init__.py
# Example usage
pyinfra inventories/dc1/main.py --limit test.dc1 deploys/deploy1.py

Two more things to mention:

Everything works fine until you create dedicated group_data directory for an inventory. After that, pyinfra decides to change a value of state.deploy_dir to the path of the inventory directory here - https://github.com/Fizzadar/pyinfra/blob/current/pyinfra_cli/main.py#L319 This changes paths for relative imports here - https://github.com/Fizzadar/pyinfra/blob/current/pyinfra_cli/main.py#L325 as well as relative paths for templates and other stuff (I didn't research where this happens in particular). So:

For templates and other paths to files, I simply hammered everything with os.getcwd(), so there's no relative paths anymore in my project. This is very awkward, but I can live with that. But the way it broke Python imports is just too much.

I understand the probable reasoning behind this logic. Still, it represents one of the many ways the layout can be constructed, and the end users shouldn't be forced to move everything inside their inventory folder.

To Reproduce

  1. Move your inventory file into a subfolder
  2. Create a group_data directory alongside your inventory file
  3. Try to run a deploy that references a file by a relative path (like files.template(src='deploy/templates/template.j2', local.include('tasks/my_task.py'), or from custom_libs import custom_class)

Expected behavior

Current working directory (from which pyinfra gets called) is at least in sys.path, so Python could find modules from the top level directory. Would be great to also find way to fix relative paths to files situation, but I didn't yet do a proper research where exactly they are implemented.

Meta

glassbeads commented 2 years ago

Looks like it can be reproduced with config.py file instead of group_data directory, too.

glassbeads commented 2 years ago

The sole purpose of this issue was to use it in the commit message https://github.com/Fizzadar/pyinfra/pull/699 But it looks like a further discussion is needed.

Fizzadar commented 2 years ago

Thank you for writing this up @glassbeads - I totally agree that this is both confusing and inflexible currently; would like to completely rework how the paths work. I think the whole idea of picking a single deploy directory is unrealistic. I think there's three things to resolve here:

  1. Given an inventory filename, where do we search for group data?
  2. Given a deploy filename, where do we search for files/templates/imports?
  3. Given both an inventory+deploy filenames, where do we search for the config file (default filename config.py)? And what path do we add to the Python import path?

IMO there's no reason this need to live in the same directory and should be evaluated independently. Current WIP thoughts below.

Inventory <> group data

For a given inventory path (eg /some/random/path/inventory.py), look in:

Additionally, add a --group-data-folder flag to override this logic. Is it a good idea to also look in the CWD? Or is this just more confusing?

Deploy <> files/templates/imports - more complex, there's a few variants here:

files.template(src='templates/my-template.j2', ...)

No indicator - is this relative to this file or to CWD or something else? This could also be an included file, in which case maybe this is relative to the parent file? Currently thinking this should lookup in:

Then there's more explicit variants, which I think are simpler to define:

# Relative to *this* file where the operation is called
files.template(src='./templates/my-template.j2', ...)
files.template(src='../templates/my-template.j2', ...)

# Absolute path, left as-is
files.template(src='/templates/my-template.j2', ...)

Config file & Python path

Currently uses the "magic"(mess) where we look at deploy and inventory files. But we can have multiple deploy files and inventory files needn't live alongside any of the deploy code, so it's not possible to cleanly pick a path here. Suggestion - just use CWD so:


Some rough thoughts, I want to get this to a point where it's super easy to explain why things do and don't import and this can be easily added to the documentation. I want to avoid things like the Ansible documentation where there's a confusing set of rules and combinations.

Fizzadar commented 2 years ago

I've edited the above post a few times but think I've condensed things into a few simple (or not? :)) rules:

As far as I can tell this satisfies fully the original problem whilst being mostly compatible with the current way things work. There is one more situation that this doesn't account for, that the current (confusing) logic does: executing a task file normally imported, ie:

Fizzadar commented 2 years ago

Simplified again:

In addition:

This keeps things MUCH simpler and should be mostly compatible with existing pyinfra code (that I'm aware of), in addition to solving the original question. The only major difference would be calling pyinfra outside of the current "deploy directory", which the --chdir flag would be a replacement for.

Fizzadar commented 2 years ago

v2 is now live which loads up any group_data directory next to the inventory file, which fixes this issue!

Also adds a --group-data flag to add additional directories.