pulp / pulp_rpm

RPM support for Pulp Platform
https://pulpproject.org/pulp_rpm/
GNU General Public License v2.0
48 stars 124 forks source link

"duplicate key value violates unique constraint" when syncing two repositories with identical sub-repositories in parallel #2278

Closed pulpbot closed 1 year ago

pulpbot commented 2 years ago

Author: wilful (wilful)

Redmine Issue: 8967, https://pulp.plan.io/issues/8967


The original issue is difficult to reproduce any longer, but there are similar issues which can be. see https://pulp.plan.io/issues/8967#note-16

========================

Hi for all!

Me need added for pulp server two repositories:

http://downloads.linux.hpe.com/SDR/repo/spp/redhat/7/x86_64/current/

http://downloads.linux.hpe.com/SDR/repo/mcp/CentOS/7/x86_64/current/

But i can't do it, becouse:

    "description": "duplicate key value violates unique constraint \"rpm_package_pkgId_key\"\nDETAIL:  Key (\"pkgId\")=(ebf96fb31b880280a25d07c596bde204df50d140) already exists.\
n"

How can I find out in which repository this package is?

pulpbot commented 2 years ago

From: wilful (wilful) Date: 2021-06-24T15:11:18Z


I thought the artifact would be reused for de-duplicate. But had a conflict =(

pulpbot commented 2 years ago

From: @dralley (dalley) Date: 2021-07-01T13:44:39Z


Hi wilful,

Could you provide a little more information? Which versions of Pulp are you running, and what steps did you take that lead you to that error?

pulpbot commented 2 years ago

From: @dralley (dalley) Date: 2021-07-06T04:31:08Z


This can be reproduced if you sync the same url into two repos at the same time, or by syncing two different urls with the same repo content at the same time. It's a race condition in the sync pipeline.

@wilful, does this match your experience? Or did you experience this while syncing the repos one after another, independently and not in parallel?

pulpbot commented 2 years ago

From: @dralley (dalley) Date: 2021-07-07T13:35:47Z


I can't seem to reproduce it on newer versions though. @wilful what version are you on?

pulpbot commented 2 years ago

From: @dralley (dalley) Date: 2021-07-28T15:05:43Z


The duplicate 7828 mentions

the Oracle Linux repositories "Oracle Linux 7 (x86_64) Latest" (http://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64) and "Oracle Linux 7 (x86_64) Optional Latest" (http://yum.oracle.com/repo/OracleLinux/OL7/optional/latest/x86_64)

So we should try that out

pulpbot commented 2 years ago

From: @ggainey (ggainey) Date: 2021-07-29T13:49:51Z


I experimented with the mentioned OLE repos on current-master and was unable to reproduce. Used this script:

pulp rpm remote create --name ol7 --url http://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64 --policy on_demand
pulp rpm remote create --name ol7opt --url http://yum.oracle.com/repo/OracleLinux/OL7/optional/latest/x86_64 --policy on_demand
for i in {1..4}                                                                              
do                                                                                           
    echo "RUN $i"                                                                            
    pulp rpm repository create --name ol7 --remote ol7 --autopublish                         
    pulp rpm repository create --name ol7opt --remote ol7opt --autopublish                   
    pulp -b rpm repository sync --name ol7; pulp -b rpm repository sync --name ol7opt        
    while true                                                                               
    do                                                                                       
        running=`pulp task list --state running | jq length`                                 
        echo -n "."     
        sleep 5                                                                     
        if [ ${running} -eq 0 ]                                                              
        then                                                                                 
            echo "DONE"                                                                      
            break                                                                            
        fi                                                                                   
    done                                                                                     
    failed=`pulp task list --state failed | jq length`                                       
    echo "FAILURES : ${failed}"                                                              
    echo "CLEANING UP..."                                                                    
    pulp rpm repository destroy --name ol7                                                   
    pulp rpm repository destroy --name ol7opt                                                
    pulp orphans delete                                                                      
done

(Note: 4 cycles took something over an hour on my system)

pulpbot commented 2 years ago

From: @dkliban (dkliban@redhat.com) Date: 2021-08-03T14:54:50Z


Based on the previous comment, I am closing.

pulpbot commented 2 years ago

From: @dralley (dalley) Date: 2021-08-05T12:54:18Z


I was able to reproduce this with a different traceback 3 times in a row - script attached

Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: pulp [2d30219697a640b2a927644cbdc7f892]: pulpcore.tasking.pulpcore_worker:INFO: Task 7d27d63b-43c2-4a0e-b9f7-c1c68bc17836 failed (insert or update on table "core_repositorycontent" violates foreign key constraint "core_repositoryconte_version_added_id_d5113f18_fk_core_repo"
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: DETAIL:  Key (version_added_id)=(a5c43989-e695-4f07-9bdb-0f879b9cdd31) is not present in table "core_repositoryversion".
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: )
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]: pulp [2d30219697a640b2a927644cbdc7f892]: pulpcore.tasking.pulpcore_worker:INFO:   File "/home/vagrant/devel/pulpcore/pulpcore/tasking/pulpcore_worker.py", line 297, in _perform_task
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     result = func(*args, **kwargs)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/home/vagrant/devel/pulp_rpm/pulp_rpm/app/tasks/synchronizing.py", line 426, in synchronize
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     subrepo_version = dv.create()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/declarative_version.py", line 151, in create
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     loop.run_until_complete(pipeline)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     return future.result()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py", line 225, in create_pipeline
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     await asyncio.gather(*futures)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/api.py", line 43, in __call__
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     await self.run()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/home/vagrant/devel/pulpcore/pulpcore/plugin/stages/content_stages.py", line 246, in run
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     self.new_version.add_content(Content.objects.filter(pk__in=to_add))
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/home/vagrant/devel/pulpcore/pulpcore/app/models/repository.py", line 763, in add_content
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     RepositoryContent.objects.bulk_create(repo_content)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     return getattr(self.get_queryset(), name)(*args, **kwargs)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/models/query.py", line 523, in bulk_create
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     obj_without_pk._state.db = self.db
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/transaction.py", line 246, in __exit__
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     connection.commit()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/utils/asyncio.py", line 26, in inner
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     return func(*args, **kwargs)
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 266, in commit
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     self._commit()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 242, in _commit
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     return self.connection.commit()
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     raise dj_exc_value.with_traceback(traceback) from exc_value
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:   File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 242, in _commit
Aug 05 12:46:16 pulp3-source-fedora34.localhost.example.com pulpcore-worker[75545]:     return self.connection.commit()

We have the same problem with the RPM plugin.

pulpbot commented 2 years ago

From: pulpbot (pulpbot) Date: 2021-11-09T21:17:21Z


PR: https://github.com/pulp/pulpcore/pull/1717

pulpbot commented 2 years ago

From: @bmbouter (bmbouter) Date: 2021-11-16T16:36:31Z


I closed my PR because I don't see a change in pulpcore that can be made to fix this. I've summarized my findings here: https://github.com/pulp/pulpcore/pull/1717#issuecomment-965695356

Per convo in matrix I am moving to pulp_rpm to get some input there. If there is something pulpcore can do to resolve please share the idea.

dralley commented 2 years ago

Can no longer reproduce - we've fixed a lot of concurrency bugs though, I bet this is one of them.

hao-yu commented 2 years ago

I am able to reproduce this issue and I have created a bugzilla for this issue. For more information please refer to the bugzilla.

https://bugzilla.redhat.com/show_bug.cgi?id=2077363

@dralley, @ggainey Can we reopen this issue? It seems like I don't have a permission to reopen it.

ggainey commented 2 years ago

@dralley, @ggainey Can we reopen this issue? It seems like I don't have a permission to reopen it.

Done! Great catch on the reproducer, Hao, thank you.

hao-yu commented 2 years ago

Not sure if it is the best solution but making the sub repo name unique for each main repo appears to solve the issue since it avoided updating same sub repo at the same time.

--- a/pulp_rpm/app/tasks/synchronizing.py   2022-01-29 22:31:17.000000000 +1000
+++ b/pulp_rpm/app/tasks/synchronizing.py   2022-04-21 22:34:49.590000000 +1000
@@ -442,7 +442,7 @@
             if repodata == DIST_TREE_MAIN_REPO_PATH:
                 treeinfo["repositories"].update({repodata: None})
                 continue
-            name = f"{repodata}-{treeinfo['hash']}"
+            name = f"{repodata}-{treeinfo['hash']}-{repository_pk}"
             sub_repo, created = RpmRepository.objects.get_or_create(name=name, sub_repo=True)
             if created:
                 sub_repo.save()
gvde commented 2 years ago

I have tested an collected a lot of information on this issue (I think) in the foreman community: https://community.theforeman.org/t/sync-errors-on-all-syncs-including-the-initial-sync-between-new-katello-server-and-content-proxy/29577/13?u=gvde

For me, the issue is only with EL8 BaseOS repositories, with some mixup of AppStream at least in the naming. I can reliably reproduce the issue when syncing my environments the first time to my content proxy...

dralley commented 2 years ago

There are possibly some ties between this and https://github.com/pulp/pulp_rpm/issues/2775, in any event it would be good to look at both at the same time.

dralley commented 2 years ago

unassigning, I have a new top priority

ggainey commented 2 years ago

This pulp-cli/jq script follows @hao-yu 's observations from BZ# 2077363 to reproduce the problem when run against a 'clean' system:

#!/bin/bash                                                                                  
URLS=(\                                                                                      
    https://cdn.redhat.com/content/dist/rhel/server/6/6.10/x86_64/kickstart/ \               
)                                                                                            
NAMES=(\                                                                                     
    r6-10-ks \                                                                               
)                                                                                            

# Make sure we're concurent-enough                                                           
num_workers=`sudo systemctl status pulpcore-worker* | grep "service - Pulp Worker" | wc -l`  
echo "Current num-workers ${num_workers}"                                                    
if [ ${num_workers} -lt 10 ]                                                                 
then                                                                                         
    for (( i=${num_workers}+1; i<=10; i++ ))                                                 
    do                                                                                       
        echo "Starting worker ${i}"                                                          
        sudo systemctl start pulpcore-worker@${i}                                            
    done                                                                                     
fi                                                                                           

echo "CLEANUP"                                                                               
for n in ${!NAMES[@]}                                                                        
do                                                                                           
    for i in {1..5}                                                                          
    do                                                                                       
        pulp rpm remote destroy --name ${NAMES[$n]}-${i}                                     
        pulp rpm repository destroy --name ${NAMES[$n]}-${i}                                 
    done                                                                                     
done                                                                                         
pulp orphan cleanup --protection-time 0                                                      

echo "SETUP URLS AND REMOTES"                                                                
for n in ${!NAMES[@]}                                                                        
do                                                                                           
    for i in {1..5}                                                                          
    do                                                                                       
        pulp rpm remote create --name ${NAMES[$n]}-${i} \                                    
          --url ${URLS[$n]} --policy on_demand \                                             
          --ca-cert @/home/vagrant/devel/pulp_startup/CDN_cert/redhat-uep.pem \              
          --client-key @/home/vagrant/devel/pulp_startup/CDN_cert/cdn.key \                  
          --client-cert @/home/vagrant/devel/pulp_startup/CDN_cert/cdn.pem | jq .pulp_href   
        pulp rpm repository create --name ${NAMES[$n]}-${i} --remote ${NAMES[$n]}-${i} | jq .pulp_href
    done                                                                                     
done                                                                                         
starting_failed=`pulp task list --limit 10000 --state failed | jq length`                    
echo "SYNCING..."                                                                            
for i in {1..5}                                                                              
do                                                                                           
    for n in ${!NAMES[@]}                                                                    
    do                                                                                       
        pulp -b rpm repository sync --name ${NAMES[$n]}-${i}                                 
    done                                                                                     
done                                                                                         
sleep 5                                                                                      
echo "WAIT FOR COMPLETION...."                                                               
while true                                                                                   
do                                                                                           
    running=`pulp task list --limit 10000 --state running | jq length`                       
    echo -n "."                                                                              
    sleep 5                                                                                  
    if [ ${running} -eq 0 ]                                                                  
    then                                                                                     
        echo "DONE"                                                                          
        break                                                                                
    fi                                                                                       
done                                                                                         
failed=`pulp task list --limit 10000 --state failed | jq length`                             
echo "FAILURES : ${failed}"                                                                  
if [ ${failed} -gt ${starting_failed} ]                                                      
then                                                                                         
  echo "FAILED: " ${failed} - ${starting_failed}                                             
  exit                                                                                       
fi 

The suggestion at https://github.com/pulp/pulp_rpm/issues/2278#issuecomment-1105159150 def makes the problem go away, resulting in a copy of a given subrepo being created for each repo syncing that content. This connects the sub-repos to their parent-repos, where the current behavior results in a subrepo with a given name/treeinfo-hash being shared by all repos that specify that name/treeinfo tuple. That sharing doesn't buy much for the Pulp instance (since the content is de-duplicated), and it feels like a potential source of other subtly-wrong behavior that we haven't noticed yet.

The remaining question is, "what (if anything?) do we need to do to fix existing systems that have already sync'd using the current behavior"? Will need some investigation and thinking.

ggainey commented 2 years ago

@goosemania has a great description of Why This Approach Won't Work, here : https://github.com/pulp/pulp_rpm/issues/2304#issuecomment-1019297646

pulpbot commented 1 year ago

https://bugzilla.redhat.com/show_bug.cgi?id=2103246