Open pulpbot opened 2 years ago
From: @dralley (dalley) Date: 2021-02-25T04:07:03Z
We should probably introduce an actual constraint that enforces this, so it blows up immediately if it ever occurs.
From: @dralley (dalley) Date: 2021-03-02T17:16:04Z
Recreating the remote and re-syncing does fix the content artifacts, but if you don't notice the problem immediately, or if you copied the content around between repos, it would be practically impossible to know how to resolve the issue.
There's some discussion on solutions here: https://hackmd.io/_3gsUVdyQwy50Nc5pMXN-g
From: @mdellweg (mdellweg) Date: 2021-03-09T14:18:58Z
Idea to solve the problem:
Add a force
flag to the DELETE call.
If force
is not specified, and the remote is referenced by either a repository or a remote artifact, the call will fail.
From: @ggainey (ggainey) Date: 2021-07-13T15:57:30Z
I'd go for 1c) from the associated hackmd. I wouldn't even give the option of --force - the state your pulp-instance gets left in is pretty horrific, even if you meant to do it. Having some way to list the content/repo-versions/artifacts, or a list of "do an immediate sync on the following repos before attempting" would be great.
From: @ipanova (ipanova@redhat.com) Date: 2021-07-21T15:37:16Z
during review of a PR there has been raised an idea about what to do with the content that does not have RA not Artifact
Do you think it would be reasonable to prevent content with no Artifact and no RemoteArtifact from being added to new repository versions via one of the validation hooks? We want to keep the historical records around, but the content is no longer useful or functional, so it doesn't make sense to let it spread into new repository versions.
EDIT: add a --force flag which can be specified when new repo-version is being created to allow such content
From: @dralley (dalley) Date: 2021-07-21T19:38:04Z
More discussion from the PR
I have not manually tested rpm plugin yet, but skimmed through pulpcore code - plugins that use directly pulp's content app handler, should be able to gracefully handle this. Uploaded content will have artifact set to none as well it won't have any remoteartifacts so a 404 will just be raised https://github.com/pulp/pulpcore/blob/master/pulpcore/content/handler.py#L681. Since pulp-container has a subclassed version of handler, i needed to modify couple of lines https://gist.github.com/ipanova/bd5821b55a1e01245fe7556dc3791ddd which led to this output:
$ podman pull localhost:24817/test/repo --tls-verify=false
Trying to pull localhost:24817/test/repo:latest...
Error: Error initializing source docker://localhost:24817/test/repo:latest: Error reading manifest latest in localhost:24817/test/repo: StatusCode: 404, 404: Not Found
(pulp) [vagrant@pulp3-source-fedora34 ~]$
tldr, i think we're fine just need to audit plugins that subclass the Handler.
EDIT: some plugins contain content that is expected to always have artifact. We should not touch those content types during reclaim disk space. For example: rpm modules and defaults, container tags, manifests and config blobs.
For the rest of the content types for which disk space was supposed to correctly reclaim the artifact, the code needs to be adjusted so it takes into account situation when ca.artifact is None content._artifacts.get()
returns ObjectDoesNotExist
[ipanova@fluffy pulp_rpm]$ git grep '\._artifacts'
pulp_rpm/app/migrations/0003_DATA_incorrect_json.py: modulemd_index.update_from_string(module._artifacts.first().file.read().decode(), True)
pulp_rpm/app/tasks/publishing.py: mod_yml.write(modulemd._artifacts.get().file.read())
pulp_rpm/app/tasks/publishing.py: mod_yml.write(default._artifacts.get().file.read())
[ipanova@fluffy pulp_rpm]$
[ipanova@fluffy pulp_rpm]$
[ipanova@fluffy pulp_rpm]$ cd ..
[ipanova@fluffy pulp3]$ cd pulp_container/
[ipanova@fluffy pulp_container]$ git grep '\._artifacts'
pulp_container/app/migrations/0007_clear_tags_artifacts_refs.py: tag._artifacts.clear()
pulp_container/app/migrations/0007_clear_tags_artifacts_refs.py: tag._artifacts.add(tag.tagged_manifest._artifacts.get())
pulp_container/app/redirects.py: artifact = manifest._artifacts.get()
pulp_container/app/redirects.py: artifact = blob._artifacts.get()
pulp_container/app/registry.py: artifact = tag.tagged_manifest._artifacts.get()
pulp_container/app/registry_api.py: artifact = manifest._artifacts.get()
pulp_container/app/registry_api.py: artifact = blob._artifacts.get()
pulp_container/app/schema_convert.py: config_artifact = manifest.config_blob._artifacts.get()
pulp_container/app/schema_convert.py: manifest_artifact = manifest._artifacts.get()
pulp_container/app/tasks/sync_stages.py: with man._artifacts.get().file.open() as content_file:
pulp_container/app/tasks/tag.py: artifact = manifest._artifacts.all()[0]
[ipanova@fluffy pulp_container]$
This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!
This issue is no longer marked for closure.
I am proceeding with option 1c from the attached hackmd document.
I have tested the removal of a remote across multiple plugins (pulp_file, pulp_rpm, pulp_python); and, only in pulp_file was a runtime error raised when a publication was getting created without remote artifacts. In the pulp_file repository, I will adjust the code to gracefully handle these kinds of scenarios (just in case):
pulpcore-worker[157321]: result = func(*args, **kwargs)
pulpcore-worker[157321]: File "/home/vagrant/devel/pulp_file/pulp_file/app/tasks/publishing.py", line 43, in publi>
pulpcore-worker[157321]: manifest.write(populate(publication))
pulpcore-worker[157321]: File "/home/vagrant/devel/pulp_file/pulp_file/manifest.py", line 133, in write
pulpcore-worker[157321]: for entry in entries:
pulpcore-worker[157321]: File "/home/vagrant/devel/pulp_file/pulp_file/app/tasks/publishing.py", line 79, in popul>
pulpcore-worker[157321]: digest=artifact.sha256,
@ipanova, thanks for sharing the reference to another issue. It seems like one change can close both the issues.
This work is put on hold until we resolve a more significant issue: https://github.com/pulp/pulpcore/issues/3212
Add a section to the documentation about deleting remotes and hazards of moving on-demand content around between repositories.
Author: @dralley (dalley)
Redmine Issue: 8305, https://pulp.plan.io/issues/8305
The deletion of the Remote deletes the RemoteArtifacts, leaving behind ContentArtifact attached to neither Artifacts nor Remotes, making them effectively corrupted and unpublishable.
This is more pernicious because content units can move throughout repositories, and if the remote is ever deleted, every repo can be broken at once with no safeguards.