Add Client.create_repository() [RHELDST-22483]

rbikar commented 8 months ago

Client is now capable of creating a repository on pulp server. The repository is initialized with proper Importer in order to enable sync/upload of content to new repositories.

Another notable changes:

409 status is not retried in requests as it is used as indicator of existing repository.
Added create_repository() method to FakeClient as well.
Small changes in the repository.yaml schema as some of the fields don't have to be necessarily initialized when creating repo but can be updated later.
Added Importer pulp object mapping class and related subclasses that encapsulate minimal required fields for this object with valid defaults.
added "importers": True to search options for repos
fixed converter of notes.signatures on Repository - empty string is now converted to empty list, not to list with empty string

codecov[bot] commented 8 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (dfe6dd8) to head (fd915a0).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #218 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 47 47 Lines 3162 3225 +63 ========================================= + Hits 3162 3225 +63 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

rbikar commented 8 months ago

I have addressed suggestions but for some reason the coverage checks started to time out.

rbikar commented 8 months ago

@rohanpm I've tried to play with the tests and timeout they hit, I was able to simulate it also on local env. but I'm still still unsure why the tests sometimes hang there and sometimes they do not. It seems to hang on waiter.acquire() I believe that that code with chaining futures in the create_repository() is OK and I think that there might be some problem in pytest that leads to deadlock. Do you have any idea why this happens?

...
  File "/home/runner/work/pubtools-pulplib/pubtools-pulplib/.tox/cov/lib/python3.8/site-packages/more_executors/_impl/map.py", line 72, in _delegate_resolved
    result = self._map_fn(result)
  File "/home/runner/work/pubtools-pulplib/pubtools-pulplib/pubtools/pulplib/_impl/client/client.py", line 1100, in get_and_check_repo
    repo_on_server.result() == repo
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    self._condition.wait(timeout)
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/threading.py", line 302, in wait
    waiter.acquire()

rohanpm commented 8 months ago

@rohanpm I've tried to play with the tests and timeout they hit, I was able to simulate it also on local env. but I'm still still unsure why the tests sometimes hang there and sometimes they do not. It seems to hang on waiter.acquire() I believe that that code with chaining futures in the create_repository() is OK and I think that there might be some problem in pytest that leads to deadlock. Do you have any idea why this happens?

I believe there is a bug in the change actually. I was able to reproduce it too, but I couldn't reproduce it after a change like below:

diff --git a/pubtools/pulplib/_impl/client/client.py b/pubtools/pulplib/_impl/client/client.py
index 8a0bc27..5e24a85 100644
--- a/pubtools/pulplib/_impl/client/client.py
+++ b/pubtools/pulplib/_impl/client/client.py
@@ -1093,11 +1093,10 @@ class Client(object):

             raise exception

-        def get_and_check_repo(_):
-            repo_on_server = self.get_repository(repo_id)
+        def check_repo(repo_on_server):
             try:
                 assert (
-                    repo_on_server.result() == repo
+                    repo_on_server == repo
                 ), "Repo exists on server with unexpected values"
             except AssertionError:
                 if importer:
@@ -1120,7 +1119,7 @@ class Client(object):
                 )
                 raise

-            return repo_on_server
+            return f_return(repo_on_server)

         LOG.debug("Creating repository %s", repo_id)
         out = self._request_executor.submit(
@@ -1128,6 +1127,9 @@ class Client(object):
         )

         out = f_map(out, error_fn=log_existing_repo)
-        out = f_flat_map(out, get_and_check_repo)
+        # After supposedly creating the repo, get it...
+        out = f_flat_map(out, lambda _: self.get_repository(repo_id))
+        # ...and check it matches what's needed
+        out = f_flat_map(out, check_repo)

         return f_proxy(out)

...i.e. changing it into a fully non-blocking style rather than a mixture of blocking and non-blocking. Do you want to give that a try?

I think the problem with the other approach is:

when you call repo_on_server.result(), this needs to block on completion of the get_repository call.
get_repository needs to go through the RetryExecutor's thread.
but: that thread might be the one currently in the middle of invoking get_and_check_repo. If so: deadlock!

It can happen randomly, or not, depending on how fast each future manages to complete. Because when you do out = f_flat_map(out, get_and_check_repo), if the future has already completed by that time, then get_and_check_repo will be immediately called in the current thread (no deadlock), but if it hasn't completed yet then get_and_check_repo will later be called from the RetryExecutor's thread (deadlock).

release-engineering / pubtools-pulplib

Add Client.create_repository() [RHELDST-22483] #218

Codecov Report