Closed pulpbot closed 1 year ago
From: wilful (wilful) Date: 2021-06-24T15:11:18Z
I thought the artifact would be reused for de-duplicate. But had a conflict =(
From: @dralley (dalley) Date: 2021-07-01T13:44:39Z
Hi wilful,
Could you provide a little more information? Which versions of Pulp are you running, and what steps did you take that lead you to that error?
From: @dralley (dalley) Date: 2021-07-06T04:31:08Z
This can be reproduced if you sync the same url into two repos at the same time, or by syncing two different urls with the same repo content at the same time. It's a race condition in the sync pipeline.
@wilful, does this match your experience? Or did you experience this while syncing the repos one after another, independently and not in parallel?
From: @dralley (dalley) Date: 2021-07-07T13:35:47Z
I can't seem to reproduce it on newer versions though. @wilful what version are you on?
From: @dralley (dalley) Date: 2021-07-28T15:05:43Z
The duplicate 7828 mentions
the Oracle Linux repositories "Oracle Linux 7 (x86_64) Latest" (http://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64) and "Oracle Linux 7 (x86_64) Optional Latest" (http://yum.oracle.com/repo/OracleLinux/OL7/optional/latest/x86_64)
So we should try that out
From: @ggainey (ggainey) Date: 2021-07-29T13:49:51Z
I experimented with the mentioned OLE repos on current-master and was unable to reproduce. Used this script:
pulp rpm remote create --name ol7 --url http://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64 --policy on_demand
pulp rpm remote create --name ol7opt --url http://yum.oracle.com/repo/OracleLinux/OL7/optional/latest/x86_64 --policy on_demand
for i in {1..4}
do
echo "RUN $i"
pulp rpm repository create --name ol7 --remote ol7 --autopublish
pulp rpm repository create --name ol7opt --remote ol7opt --autopublish
pulp -b rpm repository sync --name ol7; pulp -b rpm repository sync --name ol7opt
while true
do
running=`pulp task list --state running | jq length`
echo -n "."
sleep 5
if [ ${running} -eq 0 ]
then
echo "DONE"
break
fi
done
failed=`pulp task list --state failed | jq length`
echo "FAILURES : ${failed}"
echo "CLEANING UP..."
pulp rpm repository destroy --name ol7
pulp rpm repository destroy --name ol7opt
pulp orphans delete
done
(Note: 4 cycles took something over an hour on my system)
From: @dkliban (dkliban@redhat.com) Date: 2021-08-03T14:54:50Z
Based on the previous comment, I am closing.
From: @dralley (dalley) Date: 2021-08-05T12:54:18Z
I was able to reproduce this with a different traceback 3 times in a row - script attached
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: pulp [2d30219697a640b2a927644cbdc7f892]: pulpcore.tasking.pulpcore_worker:INFO: Task 7d27d63b-43c2-4a0e-b9f7-c1c68bc17836 failed (insert or update on table "core_repositorycontent" violates foreign key constraint "core_repositoryconte_version_added_id_d5113f18_fk_core_repo"
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: DETAIL: Key (version_added_id)=(a5c43989-e695-4f07-9bdb-0f879b9cdd31) is not present in table "core_repositoryversion".
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: )
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: pulp [2d30219697a640b2a927644cbdc7f892]: pulpcore.tasking.pulpcore_worker:INFO: File "/home/vagrant/devel/pulpcore/pulpcore/tasking/pulpcore_worker.py", line 297, in _perform_task
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: result = func(*args, **kwargs)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/home/vagrant/devel/pulp_rpm/pulp_rpm/app/tasks/synchronizing.py", line 426, in synchronize
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: subrepo_version = dv.create()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/declarative_version.py", line 151, in create
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: loop.run_until_complete(pipeline)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: return future.result()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py", line 225, in create_pipeline
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: await asyncio.gather(*futures)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py", line 43, in __call__
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: await self.run()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/content_stages.py", line 246, in run
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: self.new_version.add_content(Content.objects.filter(pk__in=to_add))
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/home/vagrant/devel/pulpcore/pulpcore/app/models/repository.py", line 763, in add_content
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: RepositoryContent.objects.bulk_create(repo_content)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: return getattr(self.get_queryset(), name)(*args, **kwargs)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/models/query.py", line 523, in bulk_create
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: obj_without_pk._state.db = self.db
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/transaction.py", line 246, in __exit__
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: connection.commit()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: return func(*args, **kwargs)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 266, in commit
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: self._commit()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 242, in _commit
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: return self.connection.commit()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: raise dj_exc_value.with_traceback(traceback) from exc_value
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 242, in _commit
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: return self.connection.commit()
We have the same problem with the RPM plugin.
From: @bmbouter (bmbouter) Date: 2021-11-16T16:36:31Z
I closed my PR because I don't see a change in pulpcore that can be made to fix this. I've summarized my findings here: https://github.com/pulp/pulpcore/pull/1717#issuecomment-965695356
Per convo in matrix I am moving to pulp_rpm to get some input there. If there is something pulpcore can do to resolve please share the idea.
Can no longer reproduce - we've fixed a lot of concurrency bugs though, I bet this is one of them.
I am able to reproduce this issue and I have created a bugzilla for this issue. For more information please refer to the bugzilla.
https://bugzilla.redhat.com/show_bug.cgi?id=2077363
@dralley, @ggainey Can we reopen this issue? It seems like I don't have a permission to reopen it.
@dralley, @ggainey Can we reopen this issue? It seems like I don't have a permission to reopen it.
Done! Great catch on the reproducer, Hao, thank you.
Not sure if it is the best solution but making the sub repo name unique for each main repo appears to solve the issue since it avoided updating same sub repo at the same time.
--- a/pulp_rpm/app/tasks/synchronizing.py 2022-01-29 22:31:17.000000000 +1000
+++ b/pulp_rpm/app/tasks/synchronizing.py 2022-04-21 22:34:49.590000000 +1000
@@ -442,7 +442,7 @@
if repodata == DIST_TREE_MAIN_REPO_PATH:
treeinfo["repositories"].update({repodata: None})
continue
- name = f"{repodata}-{treeinfo['hash']}"
+ name = f"{repodata}-{treeinfo['hash']}-{repository_pk}"
sub_repo, created = RpmRepository.objects.get_or_create(name=name, sub_repo=True)
if created:
sub_repo.save()
I have tested an collected a lot of information on this issue (I think) in the foreman community: https://community.theforeman.org/t/sync-errors-on-all-syncs-including-the-initial-sync-between-new-katello-server-and-content-proxy/29577/13?u=gvde
For me, the issue is only with EL8 BaseOS repositories, with some mixup of AppStream at least in the naming. I can reliably reproduce the issue when syncing my environments the first time to my content proxy...
There are possibly some ties between this and https://github.com/pulp/pulp_rpm/issues/2775, in any event it would be good to look at both at the same time.
unassigning, I have a new top priority
This pulp-cli/jq script follows @hao-yu 's observations from BZ# 2077363 to reproduce the problem when run against a 'clean' system:
#!/bin/bash
URLS=(\
https://cdn.redhat.com/content/dist/rhel/server/6/6.10/x86_64/kickstart/ \
)
NAMES=(\
r6-10-ks \
)
# Make sure we're concurent-enough
num_workers=`sudo systemctl status pulpcore-worker* | grep "service - Pulp Worker" | wc -l`
echo "Current num-workers ${num_workers}"
if [ ${num_workers} -lt 10 ]
then
for (( i=${num_workers}+1; i<=10; i++ ))
do
echo "Starting worker ${i}"
sudo systemctl start pulpcore-worker@${i}
done
fi
echo "CLEANUP"
for n in ${!NAMES[@]}
do
for i in {1..5}
do
pulp rpm remote destroy --name ${NAMES[$n]}-${i}
pulp rpm repository destroy --name ${NAMES[$n]}-${i}
done
done
pulp orphan cleanup --protection-time 0
echo "SETUP URLS AND REMOTES"
for n in ${!NAMES[@]}
do
for i in {1..5}
do
pulp rpm remote create --name ${NAMES[$n]}-${i} \
--url ${URLS[$n]} --policy on_demand \
--ca-cert @/home/vagrant/devel/pulp_startup/CDN_cert/redhat-uep.pem \
--client-key @/home/vagrant/devel/pulp_startup/CDN_cert/cdn.key \
--client-cert @/home/vagrant/devel/pulp_startup/CDN_cert/cdn.pem | jq .pulp_href
pulp rpm repository create --name ${NAMES[$n]}-${i} --remote ${NAMES[$n]}-${i} | jq .pulp_href
done
done
starting_failed=`pulp task list --limit 10000 --state failed | jq length`
echo "SYNCING..."
for i in {1..5}
do
for n in ${!NAMES[@]}
do
pulp -b rpm repository sync --name ${NAMES[$n]}-${i}
done
done
sleep 5
echo "WAIT FOR COMPLETION...."
while true
do
running=`pulp task list --limit 10000 --state running | jq length`
echo -n "."
sleep 5
if [ ${running} -eq 0 ]
then
echo "DONE"
break
fi
done
failed=`pulp task list --limit 10000 --state failed | jq length`
echo "FAILURES : ${failed}"
if [ ${failed} -gt ${starting_failed} ]
then
echo "FAILED: " ${failed} - ${starting_failed}
exit
fi
The suggestion at https://github.com/pulp/pulp_rpm/issues/2278#issuecomment-1105159150 def makes the problem go away, resulting in a copy of a given subrepo being created for each repo syncing that content. This connects the sub-repos to their parent-repos, where the current behavior results in a subrepo with a given name/treeinfo-hash being shared by all repos that specify that name/treeinfo tuple. That sharing doesn't buy much for the Pulp instance (since the content is de-duplicated), and it feels like a potential source of other subtly-wrong behavior that we haven't noticed yet.
The remaining question is, "what (if anything?) do we need to do to fix existing systems that have already sync'd using the current behavior"? Will need some investigation and thinking.
@goosemania has a great description of Why This Approach Won't Work, here : https://github.com/pulp/pulp_rpm/issues/2304#issuecomment-1019297646
Author: wilful (wilful)
Redmine Issue: 8967, https://pulp.plan.io/issues/8967
The original issue is difficult to reproduce any longer, but there are similar issues which can be. see https://pulp.plan.io/issues/8967#note-16
========================
Hi for all!
Me need added for pulp server two repositories:
http://downloads.linux.hpe.com/SDR/repo/spp/redhat/7/x86_64/current/
http://downloads.linux.hpe.com/SDR/repo/mcp/CentOS/7/x86_64/current/
But i can't do it, becouse:
How can I find out in which repository this package is?