opsmill / infrahub

Infrahub - A new approach to Infrastructure Management
https://opsmill.com/
GNU Affero General Public License v3.0
212 stars 18 forks source link

bug: Exception in git-agent after merging repositories #1185

Open ogenstad opened 1 year ago

ogenstad commented 1 year ago

Component

Git Integration

Current Behavior

After merging a branch that contains an external repository that has changes there is an exception from the git-agent.

Traceback (most recent call last):
  File "/Users/patrick/.virtualenvs/infrahub/bin/infrahub", line 6, in <module>
    sys.exit(app())
  File "/Users/patrick/.virtualenvs/infrahub/lib/python3.10/site-packages/typer/main.py", line 328, in __call__
    raise e
  File "/Users/patrick/.virtualenvs/infrahub/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/Users/patrick/.virtualenvs/infrahub/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/patrick/.virtualenvs/infrahub/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/Users/patrick/.virtualenvs/infrahub/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/Users/patrick/.virtualenvs/infrahub/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/patrick/.virtualenvs/infrahub/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/patrick/.virtualenvs/infrahub/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/patrick/.virtualenvs/infrahub/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/patrick/.virtualenvs/infrahub/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/patrick/Code/opsmill/infrahub/backend/infrahub/cli/git_agent.py", line 189, in start
    aiorun(_start(interval=interval, debug=debug, port=port))
  File "/nix/store/zmk12p33784bpcan0nfnji3agxy2z8wg-python3-3.10.12/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/nix/store/zmk12p33784bpcan0nfnji3agxy2z8wg-python3-3.10.12/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/patrick/Code/opsmill/infrahub/backend/infrahub/cli/git_agent.py", line 165, in _start
    await asyncio.gather(*tasks)
  File "/Users/patrick/Code/opsmill/infrahub/backend/infrahub/cli/git_agent.py", line 135, in monitor_remote_activity
    await sync_remote_repositories(client=client)
  File "/Users/patrick/Code/opsmill/infrahub/backend/infrahub/git/actions.py", line 37, in sync_remote_repositories
    await repo.sync()
  File "/Users/patrick/Code/opsmill/infrahub/backend/infrahub/git/repository.py", line 855, in sync
    await self.import_objects_from_files(branch_name=branch_name, commit=commit_after)
  File "/Users/patrick/Code/opsmill/infrahub/backend/infrahub/git/repository.py", line 1013, in import_objects_from_files
    await self.import_all_graphql_query(branch_name=branch_name, commit=commit)
  File "/Users/patrick/Code/opsmill/infrahub/backend/infrahub/git/repository.py", line 1202, in import_all_graphql_query
    local_queries = {query.name: query for query in await self.find_graphql_queries(commit=commit)}
  File "/Users/patrick/Code/opsmill/infrahub/backend/infrahub/git/repository.py", line 1594, in find_graphql_queries
    query_files = await self.find_files(extension=["gql"], commit=commit)
  File "/Users/patrick/Code/opsmill/infrahub/backend/infrahub/git/repository.py", line 1587, in find_files
    files.extend(glob.glob(f"{branch_wt.directory}/**/*.{ext}", recursive=recursive))
AttributeError: 'NoneType' object has no attribute 'directory'

Expected Behavior

No response

Steps to Reproduce

No response

Additional Information

Possibly related to the changes when we changed the worktree to be based on the branch_id instead of name.

ogenstad commented 1 year ago

The problem here is that the find_files() method below gets called without a branch_name and the commit parameter set to True (another case where mypy could have saved us).

    async def find_files(
        self,
        extension: Union[str, List[str]],
        branch_name: Optional[str] = None,
        commit: Optional[str] = None,
        recursive: bool = True,
    ) -> List[str]:
        """Return the absolute path of all files matching a specific extension in a given Branch or Commit."""
        if not branch_name and not commit:
            raise ValueError("Either branch_name or commit must be provided.")
        branch_wt = self.get_worktree(identifier=commit or branch_name)

        files = []

        if isinstance(extension, str):
            files.extend(glob.glob(f"{branch_wt.directory}/**/*.{extension}", recursive=recursive))
            files.extend(glob.glob(f"{branch_wt.directory}/**/.*.{extension}", recursive=recursive))
        elif isinstance(extension, list):
            for ext in extension:
                files.extend(glob.glob(f"{branch_wt.directory}/**/*.{ext}", recursive=recursive))
                files.extend(glob.glob(f"{branch_wt.directory}/**/.*.{ext}", recursive=recursive))
        return files

Within the .sync() method of the Repo class we have these lines:

            commit_after = await self.pull(branch_name=branch_name)
            await self.import_objects_from_files(branch_name=branch_name, commit=commit_after)

The problem here is that self.pull returns -> Union[bool, str]: where as import_objects_from_files expects commit be be a string or None.

    async def import_objects_from_files(self, branch_name: str, commit: Optional[str] = None):
        if not self.client:
            LOGGER.warning("Unable to import the objects from the files because a valid client hasn't been provided.")
            return

        if not commit:
            commit = self.get_commit_value(branch_name=branch_name)

The self.pull method returns True if the commit_before is the same as the commit_after.

Not sure if there are any other side effects of this, but a solution could be to raise an error if self.has_origin isn't true and instead of having the return True if both commits are the same we just return that commit early as a string.

    async def pull(self, branch_name: str) -> Union[bool, str]:
        """Pull the latest update from the remote repository on a given branch."""

        if not self.has_origin:
            return False

        repo = self.get_git_repo_worktree(identifier=branch_name)
        if not repo:
            raise ValueError(f"Unable to identify the worktree for the branch : {branch_name}")

        try:
            commit_before = str(repo.head.commit)
            repo.remotes.origin.pull(branch_name)
        except GitCommandError as exc:
            if "Need to specify how to reconcile" in exc.stderr:
                raise RepositoryError(
                    identifier=self.name,
                    message=f"Unable to pull the branch {branch_name} for repository {self.name}, there is a conflict that must be resolved.",
                ) from exc
            raise RepositoryError(identifier=self.name, message=exc.stderr) from exc

        commit_after = str(repo.head.commit)

        if commit_after == commit_before:
            return True

        self.create_commit_worktree(commit=commit_after)
        await self.update_commit_value(branch_name=branch_name, commit=commit_after)

        return commit_after
dgarros commented 1 year ago

Not sure if there are any other side effects of this, but a solution could be to raise an error if self.has_origin isn't true and instead of having the return True if both commits are the same we just return that commit early as a string.

I think this is a good plan, indeed it's not good to return either a commit or a boolean

ogenstad commented 1 year ago

I have to go back on what I said about this, it could be that it never worked and the problem is that I typically use repositories from a local folder. That does not work at the moment, I think it might have worked in the past but I couldn't swear on it. When I test this with an external repository at Github it works as intended.

Damien do you know if this used to work?

I'll keep this open as the type hints are incorrect regardless and can be a bit confusing now. Also it would be if it worked locally too.

A problem seems to be when pushing the branch the error is silently ignored, if captured it reads:

'[remote rejected] (branch is currently checked out)\n

I'll leave this in the milestone for now but probably remove it at a later point if we have to focus on other things.

BRBCoffeeDebuff commented 6 months ago

Does this bug still occur? I think it is worth ironing out if we still have the issue.

ogenstad commented 6 months ago

I haven't done any work in this area in quite some time so I'm not sure. However I don't think the code has been updated in a while either. At any rate I think this specific bug was something that should only happen during local development and not in production. There are some other known issues that we have to look at with regards to the git integration and I think when we get to those we'll be able to revisit this one.