pulp / pulp_deb

Debian repository plugin for Pulp (pulpproject.org)
GNU General Public License v2.0
61 stars 76 forks source link

Retrieve functionality breaks semver, requires task monitoring #1038

Closed daviddavis closed 1 month ago

daviddavis commented 4 months ago

A recent change was introduced that made a backward incompatible change to the API. I would think this violates semver as it says you must "increment MAJOR version when you make incompatible API changes". The only exception it calls out are bug fixes.

That aside, the change is problematic for us. In our system, users make a single web request to add a new release to a repo which in turn finds or creates a Release along with the necessary ReleaseArchitectures and ReleaseComponents, and then adds all the necessary objects to the repo. We don't have a background job system nor a way to monitor Pulp tasks. So getting back tasks to monitor is problematic for us.

I wonder if it would be possible to solve both these problems by either:

  1. Introducing a new parameter to the create actions that would switch between the old functionality or the new retrieve functionality. Or maybe only return a task if repository is set and otherwise return the new architecture/component.
  2. Creating new endpoints to handle this new retrieve functionality
quba42 commented 4 months ago

Ok, I need to split my initial response into three parts:

First, regarding semver: Rightly or wrongly, I have never considered pulpcore's semver promise to apply to pulp_deb. We even documented our own pulp_deb version semantics and those are not equivalent to semver. That being said, looking at the location and the pulpcore docs now, I don't think this is a communication that anyone is likely to see. As a result, I have started a forum discussion. Either pulp_deb needs to start adhearing to semver, or the documentation needs to become much clearer that not all plugins are doing so.

Second, regarding the fact that the API now returns a task: This was always a complete oversight/accident/bug that these API endpoints did not return a task. Especially considering that when supplying the repository parameter, the API endpoints in question will create a new repository version. When doing this outside of a task, this does not put any resource locks on the repository, which (I have been told) is not data safe. None of the Content API endpoints in pulp_rpm return anything other than a task for a POST request. The package API endpoint has (to my knowledge) always returned a task. For these reasons, I am pretty hesitant to moving back the other way. How are you currently handling the Package API endpoint (which does return a task)? Would it be possible for you to use the Package API endpoint plus the optional distribution, component, and repository parameters to have the package, ReleaseComponent, and PackageReleaseComponent added to the repo in one single task? I just realized the Release content endpoint still does not use a task even though it may create a repository version so whatever direction we move in here, there is still work to be done...

Third, regarding the retrieve logic: Actually I am not sure from your post if the retrieve logic is problematic for you separately from the task thing?

I presume for now, you have simply gone back to the 3.1.* version? We are currently quite busy on our end with not pulp_deb stuff, so there are unlikely to be any (non-bugfix) releases for a while. Hopefully that gives us some time to chart a constructive path forwards.

daviddavis commented 4 months ago

Thanks for the response. I was under the impression that pulp_deb used semver so that's good to know that you don't and have documented how you version pulp_deb. I also appreciate the forum discussion around plugin version specifications. I guess we can wait until that discussion is resolved before we talk more about pulp_deb's version specification. Even if you ultimately decide not to use semver, maybe we can contribute feedback because I think the pulp_dev version spec could be more friendly for users.

I totally agree that adding content to a repo requires returning a task and not using a task was an oversight. For users like us though that aren't using the repository field, I wonder what reason there is to return a task? In other words, if we're just creating a ReleaseComponent could the API just return the object and then when a user specifies the repository, return a task? I think this should be possible and would make much more sense for users that aren't using the repository field.

For the package creation API endpoint, we just call it, return the resulting Pulp task to the user, and they can query its status. The same thing for adding a package to a repo: we look up the package id and PackageReleaseComponent id and then call the repository modify endpoint with both and return the resulting modify task to the user.

Creating releases is much more complicated and we've tried to simplify it by having a single endpoint creates a release and adds it to a repo. This creates the release, the release components, and the release architectures, and then adds all of these objects to a repo. We then return the repository modify task to our users and we can do this because only the last step launches a task. Getting back tasks for each object would mean our users would have to create the individual pieces (components, architectures, releases) and then add all these pieces to a repo. We'd like to avoid that if possible.

The retrieve logic is not a problem for us aside from the API change where these endpoints now return tasks. I agree with you that rolling forward makes sense. I've mentioned a couple of options and I think there's maybe even a third option: given that the release endpoint needs to return a task, perhaps this change could also add fields components and architectures? This would allow us to greatly simplify our code if we could make a single request and return the task to our users:

http POST <pulp>/api/v3/content/deb/releases/ distribution=test codename=test suite=stable architectures:='["amd64", "arm64"]' components:='["main"]' repository=<repository pulp_href>

But I am guessing this would be a major piece of work and probably not as simple as some of the other options I mentioned. For right now, we're on 3.2 but we have the commit in question reverted as a patch. This isn't ideal though as it makes upgrading harder and we just went through the trouble of getting rid of our local fork once the package source work was merged.

daviddavis commented 4 months ago

For the third option I mentioned, I have a small proof of concept here that would allow users to create releases with architectures and components:

https://github.com/daviddavis/pulp_deb/commit/15ecc2d0f4963732eb74a54a64b5e06a4872a4b2

quba42 commented 4 months ago

In other words, if we're just creating a ReleaseComponent could the API just return the object and then when a user specifies the repository, return a task? I think this should be possible and would make much more sense for users that aren't using the repository field.

I think that this is certainly technically possible, and the only reasons I can come up with not to do it, are long the lines of: "No other plugins has API endpoints that sometimes return objects and sometimes tasks. This is perhaps a little counter intuitive." But then again, other plugins don't have "structure content" so maybe there is no reason to be different here.

Regarding your proposal for new parameters on the /content/deb/releases/ API endpoint: I think this goes into the same direction as what we have been doing on the /content/deb/packages/ endpoint with the distribution and component parameters there (though that currently only works in combination with repository, but that could be extended/is an implementation detail). So I am certainly interested in creating that.

I am imagining a future release where all the ReleaseArchitectures, ReleaseComponents, PackageReleaseComponents one would ever need would be created (or retrieved) by supplying extra arguments to just the /content/deb/releases/ or /content/deb/packages/ endpoints. These would then always return a task that lists all relevant objects in it's created_resources field (whether created or retrieved).

That way the /content/deb/release_architectures/, /content/deb/release_components/, and /content/deb/package_release_components/ endpoints, could essentially be deprecated and everyone steered towards the new workflows, but they could still be kept around for backwards compatibility. If they are only there for backwards compatibility/are deprecated, then there really is no reason whatsoever to worry about them being "a little weird" (i.e. return a task if repository is specified, but an object if repository is not specified).

In conclusion: I think I am starting to see things your way. That being said, bandwidth being what it is, I don't know when we might work on this, but I can promise to revisit by the next Y release (whenever that will be). Actually I believe the next Y release may be a "declare compatibility against a new breaking change pulpcore version release", in which case I would revisit for the next (proper) Y release after that. :wink:

mdellweg commented 4 months ago

In other words, if we're just creating a ReleaseComponent could the API just return the object and then when a user specifies the repository, return a task? I think this should be possible and would make much more sense for users that aren't using the repository field.

I think that this is certainly technically possible, and the only reasons I can come up with not to do it, are long the lines of: "No other plugins has API endpoints that sometimes return objects and sometimes tasks. This is perhaps a little counter intuitive." But then again, other plugins don't have "structure content" so maybe there is no reason to be different here.

I'm not so confident this is easily possible. For proper operation with the existing client tooling, the apispec must accurately describe the api response. And I'm not sure whether drf-spectacular or openapi-generator can deal with polymorphism nicely. Sure it would be interesting to figure this out.

OTOH, and this is much more philosophical, having designed pulp to be about content AND repositories instead of content IN repositories was IMHO a bad decision. And it really didn't work out when we added RBAC support. My take is: Content creation should always happen inside a repository. And not being in a repository, content is subject to be cleaned up without further warning (Yes, we have ORPHAN_PROTECTION_TIME and "touch" but those are simply band-aids for the same design flaw.).

daviddavis commented 4 months ago

The fact that responses accepts a dictionary seems to indicate that maybe it does allow for different response data structures but I suppose we'd have to test it out to know for sure.

That said, if we could specify architectures and components when creating Releases, that solution would work best for us as we'd no longer have to call the ReleaseArchitecture or ReleaseComponent create endpoints.