pyinfra-dev / pyinfra

pyinfra turns Python code into shell commands and runs them on your servers. Execute ad-hoc commands and write declarative operations. Target SSH servers, local machine and Docker containers. Fast and scales from one server to thousands.
https://pyinfra.com
MIT License
3.91k stars 382 forks source link

fact cache write-back not working for files.directory #862

Closed morrison12 closed 2 years ago

morrison12 commented 2 years ago

Describe the bug

It appears different keys into the host.facts cache are (sometimes ?) used depending on whether a fact is being retrieved or an operation is trying to update the fact cache with host.create_fact.

The example below uses permissions on a directory but the behaviour seems to be more general.

From what I can tell, there are couple of items that contribute to the cache keys being different between a 'get' and a 'set':

  1. When host.create_fact(...) is called from inside an operation, the kwargs dict has an entry for each 'global argument' (e.g _su) whereas when host.get_fact(...) is called from the deploy, the kwargs dict has only what is passed by the caller (e.g. path in the case of files.directory).

  2. Host.create_fact does not have an *args parameter whereas Host.get_fact does. While this might not matter as as kwargs comes after args and it appears only the values are hashed, but there would seem to be a possible ordering problem once (1) is resolved.

To Reproduce

The code below checks the permissions on a directory, changes them and then reads back what is expected to be the new value It was run as rmdir foo && mkdir foo && pyinfra @local ./test.py

Code

test.py

from pathlib import Path
from pprint import pformat

from pyinfra import host, logger
from pyinfra.api import deploy
from pyinfra.facts.files import Directory
from pyinfra.operations import files

BEFORE = 755
AFTER = 644

@deploy("test deploy")
def test() -> None:
    test_dir = Path(".") / "foo"
    assert test_dir.exists()

    print("fact cache before any code executed")
    print(pformat(host.facts) + "\n")

    if host.get_fact(Directory, test_dir.name)["mode"] != BEFORE:
        logger.error(f"failed before change, expect mode of {BEFORE} to start")
    print("fact cache after 1st mode check")
    print(pformat(host.facts) + "\n")

    if host.get_fact(Directory, test_dir.name)["mode"] != BEFORE:
        logger.error(f"failed before change, expect mode of {BEFORE} to start")
    print("fact cache after 2nd mode check")
    print(pformat(host.facts) + "\n")

    files.directory(test_dir.name, mode=AFTER, present=True)
    print("fact cache after mode change")
    print(pformat(host.facts) + "\n")

    if host.get_fact(Directory, test_dir.name)["mode"] != AFTER:
        logger.error(f"failed after change, expected mode of {AFTER} after change")
    print("fact cache after 2nd mode check")
    print(pformat(host.facts) + "\n")

test()

Output

--> Loading config...
--> Loading inventory...

--> Connecting to hosts...
    [@local] Connected

--> Preparing operations...
    Loading: ./test.py
fact cache before any code executed
{}

fact cache after 1st mode check
{'2c74fd9a65f89776b451e19964302d186ce4870c': {'atime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'ctime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'group': 'staff',
                                              'mode': 755,
                                              'mtime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'size': 64,
                                              'user': 'james'}}

fact cache after 2nd mode check
{'2c74fd9a65f89776b451e19964302d186ce4870c': {'atime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'ctime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'group': 'staff',
                                              'mode': 755,
                                              'mtime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'size': 64,
                                              'user': 'james'}}

fact cache after mode change
{'2c74fd9a65f89776b451e19964302d186ce4870c': {'atime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'ctime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'group': 'staff',
                                              'mode': 755,
                                              'mtime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'size': 64,
                                              'user': 'james'},
 'ec262d81820a25c8da29b49a6f4850c79101c427': {'atime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'ctime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'group': 'staff',
                                              'mode': 644,
                                              'mtime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'size': 64,
                                              'user': 'james'}}

  failed after change, expected mode of 644 after change
fact cache after 2nd mode check
{'2c74fd9a65f89776b451e19964302d186ce4870c': {'atime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'ctime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'group': 'staff',
                                              'mode': 755,
                                              'mtime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'size': 64,
                                              'user': 'james'},
 'ec262d81820a25c8da29b49a6f4850c79101c427': {'atime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'ctime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'group': 'staff',
                                              'mode': 644,
                                              'mtime': datetime.datetime(2022, 8, 2, 22, 6, 32),
                                              'size': 64,
                                              'user': 'james'}}

    [@local] Ready: ./test.py

--> Proposed changes:
    Groups: @local
    [@local]   Operations: 1   Change: 1   No change: 0   

--> Beginning operation run...
--> Starting operation: test deploy | Files/Directory (foo, mode=644, present=True)
    [@local] Success

--> Results:
    Groups: @local
    [@local]   Changed: 1   No change: 0   Errors: 0 

Expected behavior

I expected: a) only one entry in the fact cache at each output (vs. the two that are seen) and b) the permissions to be expected values at each step

Meta

pyinfra was installed with git clone ..., cd pyinfra and pip install -e '.[dev]'

git describe --tags gives: v2.3-25-g5fedec46

pyinfra --support gives:

    System: Darwin
      Platform: macOS-11.6.8-x86_64-i386-64bit
      Release: 20.6.0
      Machine: x86_64
    pyinfra: v2.3
    Executable: /Users/james/.virtualenvs/pyinfra/bin/pyinfra
    Python: 3.10.2 (CPython, Clang 13.0.0 (clang-1300.0.29.30))
Fizzadar commented 2 years ago

So the reason the global arguments are part of the hash is to account for fact results being different depending on them (eg sudo/not sudo). But as you point out this essentially means facts are only good within the context of an operation, they won't correctly invalidate other facts for the same thing called without those arguments.

Perhaps a good fix would be to always use the default set of global arguments in facts, which would address the example above.

Regarding point 2 the args are converted to kwargs, they're only supported for backwards compatibility:

https://github.com/Fizzadar/pyinfra/blob/7cc0e02149106d561df3f6acba390e20b7f7c1c5/pyinfra/api/facts.py#L206-L209

Fizzadar commented 2 years ago

Spotted the kwargs/args here - calling with args/kwargs does indeed generate a different hash because the normalisation I referenced above happens after the hash generation. Fixed in https://github.com/Fizzadar/pyinfra/commit/bc06372d58c5178dd13402f196a3b6b8e84d6e2e.

I have fixed the default arguments thing in https://github.com/Fizzadar/pyinfra/commit/d6c3db64383b0a8f81c3b649d72aa8bcba8999ce, and now the test script above executes as expected.

Fizzadar commented 2 years ago

Fixes are now live in v2.5.2 :)