uyuni-project / uyuni

Source code for Uyuni
https://www.uyuni-project.org/
GNU General Public License v2.0
440 stars 185 forks source link

Abnormaly long repo-sync task running for Ubuntu channels #5579

Open phibid opened 2 years ago

phibid commented 2 years ago

Uyuni 2022.02

I have noticed that on my both prod and pre-prod Uyuni env, there is no way anymore to force an immediate repo sync from the GUI with the button 'Syn Now'. Was working on the previous versions, but on the perhaps 2 last ones, no way.

If I click, I can see the request reach Tomcat on the access_log:

10.10.10.10 - - [20/Jun/2022:18:10:29 +0200] "POST /rhn/channels/manage/Sync.do?cid=130 HTTP/1.1" 302 -
10.10.10.10 - - [20/Jun/2022:18:10:30 +0200] "GET /rhn/channels/manage/Sync.do?cid=130 HTTP/1.1" 200 47079
10.10.10.10 - - [20/Jun/2022:18:10:30 +0200] "POST /rhn/manager/frontend-log HTTP/1.1" 200 16
127.0.0.1 - - [20/Jun/2022:18:10:30 +0200] "GET /rhn/websocket/notifications HTTP/1.1" 101 -

But no reaction from taskomatic.

santeri3700 commented 2 years ago

Perhaps you already have another "repo-sync" task running? See [Admin -> Task Engine Status -> Runtime Status] if you have any running "repo-sync" tasks. \ My understanding is that the "Sync Now" task will only start (and appear in Runtime Status) when the previous "repo-sync" task has been completed. \ The "Sync Now" action might have a delay if you've scheduled other channels/repos to be automatically synchronized at the moment.

phibid commented 2 years ago

Ah, to be honest, I was totally unaware about this 'Task Engine Status' web page, haha. Thanks for the tip. I will check it to see if indeed some tasks are running which prevent my 'Sync Now' tasks to immediately run.

phibid commented 2 years ago

Ok, so indeed. I was always relying before on the logs in /var/log/reposync/my_repo to see the status of the reposync. However this is not reliable anymore as in the Task Engine Status -> Runtime Status GUI menu I can observe that the repo-sync task continues to run a long time after the "Sync completed." appears in the reposync logs.

I suspect the errata generation/refresh. Could be perhaps good to have the errata generation progress to appear in the reposync logs file ? The 'Sync completed' is misleading as the reposync is actually not totally done.

phibid commented 2 years ago

For example, I am syncing repo ubuntu-2004-amd64-universe-uyuni

2022/06/22 11:15:31 +02:00 Command: ['/usr/bin/spacewalk-repo-sync', '--channel', 'ubuntu-2004-amd64-universe-uyuni', '--type', 'deb', '--non-interactive']
2022/06/22 11:15:31 +02:00 Sync of channel started.
2022/06/22 11:16:00 +02:00 Repo URL: http://archive.ubuntu.com/ubuntu/dists/focal/universe/binary-amd64/
2022/06/22 11:16:00 +02:00     Packages in repo:             53206
2022/06/22 11:17:42 +02:00     No new packages to sync.
2022/06/22 11:17:42 +02:00
2022/06/22 11:17:42 +02:00   Patches in repo: 0.
2022/06/22 11:17:42 +02:00 Sync completed.

Seems that in began at 11h15 and ended at 11h17, however in the 'Runtime status', it is still running:

image

mbussolotto commented 2 years ago

Hi @phibid does it happens just for an ubuntu channel or also for others?

phibid commented 2 years ago

I can see the same behavior going-on right now for Ubuntu 18.04:

2022/06/27 10:12:02 +02:00 Command: ['/usr/bin/spacewalk-repo-sync', '--channel', 'ubuntu-1804-amd64-universe-uyuni', '--type', 'deb', '--non-interactive']
2022/06/27 10:12:02 +02:00 Sync of channel started.
2022/06/27 10:12:07 +02:00 Repo URL: http://archive.ubuntu.com/ubuntu/dists/bionic/universe/binary-amd64/
2022/06/27 10:12:07 +02:00     Packages in repo:             53596
2022/06/27 10:13:53 +02:00     No new packages to sync.
2022/06/27 10:13:53 +02:00
2022/06/27 10:13:53 +02:00   Patches in repo: 0.
2022/06/27 10:13:53 +02:00 Sync completed.

image

Once done, will check for the other ones, especially for Rocky and Alma.

phibid commented 2 years ago

Still running:

image

Meanwhile I can see that the taskomatic Java process is consuming a lot:

Tasks: 373 total,   1 running, 372 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.5 us,  0.7 sy,  0.0 ni, 86.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 15975.49+total,  697.965 free, 9806.965 used, 5470.566 buff/cache
MiB Swap: 3071.996 total, 1874.070 free, 1197.926 used. 1715.387 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  3836 root      20   0 11.594g 6.368g  10108 S 100.0 40.82   3468:07 java
 82386 postgres  20   0   56196   4696   3780 R 5.882 0.029   0:00.07 top
     1 root      20   0   91116   7872   5068 S 0.000 0.048   1:06.89 systemd

And I can see those selects in the psql:

 76988 | 00:00:00.529678        | admin    | select errata0_.id as id1_98_, errata0_.advisory as advisory2_98_, errata0_.advisory_type as advisory3_98_, errata0_.advisory_status as advisory4_98_, errata0_.product as product5_98_, errata0_.description as descript6_98_, errata0_.synopsis as synopsis7_98_, errata0_.topic as topic8_98_, errata0_.solution as solution9_98_, errata0_.errata_from as errata_10_98_, errata0_.issue_date as issue_d11_98_, errata0_.update_date as update_12_98_, errata0_.notes as notes13_98_, errata0_.rights as rights14_98_, errata0_.refers_to as refers_15_98_, errata0_.advisory_name as advisor16_98_, errata0_.advisory_rel as advisor17_98_, errata0_.locally_modified as locally18_98_, errata0_.last_modified as last_mo19_98_, errata0_.created as created20_98_, errata0_.modified as modifie21_98_, errata0_.org_id as org_id22_98_, errata0_.severity_id as severit23_98_, errata0_1_.original_id as original2_100_, case when errata0_1_.id is not null then 1 when errata0_.id is not null then 0 end as clazz_ from rhnErrata errata0_ left outer j

Not sure it has a relation to be honest, but seems to be.

phibid commented 2 years ago

Finished just now:

image

The select is not there anymore and CPU consumption just dropped.

mbussolotto commented 2 years ago

@phibid so I guess now the Sync Now button is working fine, right?

phibid commented 2 years ago

Indeed, the Sync Now button is working fine. I think that this issue's title should be modified as the issue is coming from unexpectedly long time execution of the repo-sync tasks

mbussolotto commented 2 years ago

CC @lucidd

lucidd commented 2 years ago

syncing ubuntu errata can take some time but it should not take this long. My guess would be that the db access may be slower then usual here. You can enable debug logging for com.redhat.rhn.manager.content.ubuntu.UbuntuErrataManager to get some information about what takes up the time.

mbussolotto commented 2 years ago

@ktsamis is it the same issue happens on your system? you can check this https://github.com/uyuni-project/uyuni/issues/5579#issuecomment-1167111008

phibid commented 2 years ago

@lucidd Where exactly do I need to add this debug ?

mbussolotto commented 2 years ago

add log4j.logger.com.redhat.rhn.manager.content.ubuntu.UbuntuErrataManager=DEBUG to the file /usr/share/rhn/classes/log4j.properties , then run rctaskomatic restart .

Long explanation here: https://github.com/uyuni-project/uyuni/wiki/Java-Logging

phibid commented 2 years ago

I have just updated our preprod env to 2022.11, and I can still see the same behaviour.

I have added the log4j debug, and here is what I can observe during the update of the Ubuntu repository:

2022-11-27 11:24:30,229 [DefaultQuartzScheduler_Worker-19] INFO  com.redhat.rhn.taskomatic.task.RepoSyncTask - Syncing repos for channel: Ubuntu 20.04 LTS AMD64 Main for Uyuni
2022-11-27 11:24:45,494 [Thread-87] INFO  com.redhat.rhn.taskomatic.task.RepoSyncTask - 11:24:30 ======================================
11:24:30 | Channel: ubuntu-2004-amd64-main-uyuni
11:24:30 ======================================
11:24:30 Sync of channel started.
11:24:31 Repo URL: http://archive.ubuntu.com/ubuntu/dists/focal/main/binary-amd64/
11:24:31     Packages in repo:              6090
11:24:45     No new packages to sync.
11:24:45
11:24:45   Patches in repo: 0.
11:24:45   Updating overview of 7 systems

2022-11-27 11:24:45,494 [Thread-88] ERROR com.redhat.rhn.taskomatic.task.RepoSyncTask - Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 299, in _execute_wrapper
    retval = function(*p, **kw)
  File "/usr/lib/python3.6/site-packages/spacewalk/server/rhnSQL/sql_base.py", line 182, in _execute
    return self._execute_(args, kwargs)
  File "/usr/lib/python3.6/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 322, in _execute_
    self._real_cursor.execute(self.sql, params)
psycopg2.errors.UndefinedTable: missing FROM-clause entry for table "c"
LINE 4:         coalesce(c.org_id, 1),
                         ^

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/spacewalk-repo-sync", line 286, in <module>
    sys.exit(abs(main() or 0))
  File "/usr/bin/spacewalk-repo-sync", line 269, in main
    elapsed_time, channel_ret_code = sync.sync()
  File "/usr/lib/python3.6/site-packages/spacewalk/satellite_tools/reposync.py", line 705, in sync
    self.update_servers()
  File "/usr/lib/python3.6/site-packages/spacewalk/satellite_tools/reposync.py", line 739, in update_servers
    taskomatic.add_to_system_overview_update_queue(row["id"])
  File "/usr/lib/python3.6/site-packages/spacewalk/server/taskomatic.py", line 103, in add_to_system_overview_update_queue
    h.execute(sid=sid)
  File "/usr/lib/python3.6/site-packages/spacewalk/server/rhnSQL/sql_base.py", line 150, in execute
    return self._execute_wrapper(self._execute, *p, **kw)
  File "/usr/lib/python3.6/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 309, in _execute_wrapper
    raise sql_base.SQLStatementPrepareError(str(self.dbh), e.pgerror, self.sql)
spacewalk.server.rhnSQL.sql_base.SQLStatementPrepareError: ('ERROR:  missing FROM-clause entry for table "c"\nLINE 4:         coalesce(c.org_id, 1),\n                         ^\n', "<connection object at 0x7effc1d44528; dsn: 'user=admin password=xxx dbname=uyuni', closed: 0>", "\n    INSERT INTO rhnTaskQueue (org_id, task_name, task_data, priority, earliest)\n    VALUES (\n        coalesce(c.org_id, 1),\n        'update_system_overview',\n        %(sid)s,\n        0,\n        current_timestamp\n    )\n    ")

2022-11-27 11:24:45,526 [DefaultQuartzScheduler_Worker-19] ERROR com.redhat.rhn.taskomatic.task.RepoSyncTask - Command '[/usr/bin/spacewalk-repo-sync, --channel, ubuntu-2004-amd64-main-uyuni, --type, deb, --non-interactive]' exited with error code 1: Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 299, in _execute_wrapper
    retval = function(*p, **kw)
  File "/usr/lib/python3.6/site-packages/spacewalk/server/rhnSQL/sql_base.py", line 182, in _execute
    return self._execute_(args, kwargs)
  File "/usr/lib/python3.6/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 322, in _execute_
    self._real_cursor.execute(self.sql, params)
psycopg2.errors.UndefinedTable: missing FROM-clause entry for table "c"
LINE 4:         coalesce(c.org_id, 1),
                         ^

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/spacewalk-repo-sync", line 286, in <module>
    sys.exit(abs(main() or 0))
  File "/usr/bin/spacewalk-repo-sync", line 269, in main
    elapsed_time, channel_ret_code = sync.sync()
  File "/usr/lib/python3.6/site-packages/spacewalk/satellite_tools/reposync.py", line 705, in sync
    self.update_servers()
  File "/usr/lib/python3.6/site-packages/spacewalk/satellite_tools/reposync.py", line 739, in update_servers
    taskomatic.add_to_system_overview_update_queue(row["id"])
  File "/usr/lib/python3.6/site-packages/spacewalk/server/taskomatic.py", line 103, in add_to_system_overview_update_queue
    h.execute(sid=sid)
  File "/usr/lib/python3.6/site-packages/spacewalk/server/rhnSQL/sql_base.py", line 150, in execute
    return self._execute_wrapper(self._execute, *p, **kw)
  File "/usr/lib/python3.6/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 309, in _execute_wrapper
    raise sql_base.SQLStatementPrepareError(str(self.dbh), e.pgerror, self.sql)
spacewalk.server.rhnSQL.sql_base.SQLStatementPrepareError: ('ERROR:  missing FROM-clause entry for table "c"\nLINE 4:         coalesce(c.org_id, 1),\n                         ^\n', "<connection object at 0x7effc1d44528; dsn: 'user=admin password=xxx dbname=uyuni', closed: 0>", "\n    INSERT INTO rhnTaskQueue (org_id, task_name, task_data, priority, earliest)\n    VALUES (\n        coalesce(c.org_id, 1),\n        'update_system_overview',\n        %(sid)s,\n        0,\n        current_timestamp\n    )\n    ")
phibid commented 2 years ago

Sorry, please disregard my last message, this was related to https://github.com/uyuni-project/uyuni/issues/6193, I have applied the patch for 2022.11 and no more message in the logs.

j-kihet commented 12 months ago

I'm having the same issue on 2023.09. Should I make a new issue?