lxqt / qps

Qt process viewer and manager
https://lxqt.github.io
GNU General Public License v2.0
72 stars 39 forks source link

Program doesn't show processes after closing it with the "exit on closing" option disabled #447

Closed glu8716 closed 7 months ago

glu8716 commented 10 months ago
Expected Behavior

I've disabled the "exit on closing" option in the Preferences. When I close the program it minimizes to the tray icons. When I open it again after some minutes or hours I'd expect to see the processes.

Current Behavior

I don't see the processes after the program has been closed for some minutes or hours (not sure about the timing).

Possible Solution
Steps to Reproduce (for bugs)
  1. Disable "exit on closing" in Preferences
  2. Close the program. It will minimize to the tray icons
  3. Open it again after some minutes/hours
  4. The programs doesn't show processes
Context
System Information
stefonarch commented 10 months ago

Sounds similar to https://github.com/lxqt/qps/issues/251 Does it happen only when minimized to tray? No dev could reproduce this issue until now.

tsujan commented 10 months ago

Strange! Some step(s) for reproducing it may be missing from the report because I've already tested Qps by keeping it running for hours, and the list has been OK.

glu8716 commented 10 months ago

Yes, it happens only when minimized to tray when the option "exit on closing" is disabled. 2023-08-24_22-53 I have to close the program and re-open it in order to see processes again. And I don't know what could be missing from the steps as this seems to happen after the program has been minimized for some minutes or hours.

Bluey26 commented 10 months ago

Hello.

I used to have that problem too, the program is running in background, and when i open its window, there's no process. I tested some time ago with the source code of qps, compiled it and it did not showed that problem to me. I did not install it, but i use it with a keyboard shortcut, to open that compiled version instead of the package that comes in my distro.

Somehow, i cannot reproduce(now) myself the problem with the Qps version that i have installed from AUR repository, it works properly now. I have no idea, but maybe is some 'misunderstanding' between system apps and qps, and that does not happen using the source app.

I have no idea of how this program works, but i just wanted to add my experience with it. It works fine for me, i have not seen the problem anymore.

Just in case: OS:arch-linux Qps version: 2.6 (the one which produced me the issue), from AUR packages(Arch). Now its 2.7 in AUR.

tsujan commented 10 months ago

I don't know what could be missing from the steps

@glu8716, your report leaves no doubt that there is an issue, but there's also an unknown/hidden factor because, otherwise, everyone would be able to reproduce it.

Until that factor is found, you could help us by answering the following questions. When the issue happens, will it make any difference if you

  1. Hide the window to the tray and reopen it (without quitting)?
  2. Resize the window?
  3. Pause and resume the app by using the Pause button?
  4. Change the sort column?
  5. Check "Tree" or "Thread" box?
  6. Change "All Processes" combo-box?

Since the process count is 0 in the status-bar of your screenshot, at least we know that this isn't about graphics.

glu8716 commented 10 months ago

I tried everything you said but nothing changes, the bug is still there

tsujan commented 10 months ago

I tried everything you said

Thanks.

the bug is still there

The questions weren't mean to suggest workarounds but to show where the cause might be. Your answer showed that it might not be about refreshing of the view (contrary to what I thought).

Sadly, trying to fix it without being able to reproduce it is like shooting in the dark.

Bluey26 commented 10 months ago

Maybe you can try to check if the problem is present in another package. You can try it with the source code, without messing your installed qps doing the following: Download the code(from the green button) in the qps git page. Extract it, that will generate a folder called 'qps-master' open a terminal window, go to that directory, the run the following(inside qps-master):

mkdir build && cd build
cmake ..
make

After that ends, there will be a 'qps' binary, executable in a folder called 'src', inside build. You can run it using:

./src/qps

or executing the 'qps' binary.

You can try to use that one a few minutes/hours to check if the problem persists.

tsujan commented 10 months ago

Also, @Bluey26's comment suggests that the problem may be in the distro's binary. Testing by compiling from the source is a good way of knowing that.

@glu8716, could you do it? It doesn't need to be installed (no sudo make install); it can be just run from the compilation folder.

@Bluey26, we commented almost simultaneously ;)

glu8716 commented 10 months ago

I'll try and compile it from source, thanks. I'll report back in a few days as I'll not be home.

glu8716 commented 10 months ago

Can confirm that I'm having the same issue with the compiled from source version. I've also looked into the AUR PKGBUILD and there seems to be nothing strange there. It just compiles from the latest release source.

tsujan commented 10 months ago

@glu8716, Thanks!

Could you also attach your ~/.config/qps/qps.conf?

glu8716 commented 10 months ago

Sure! qps.zip

tsujan commented 10 months ago

It just compiles from the latest release source.

Yes, but something may change in an upgraded dependency so that an app may need recompilation because of it. That's why I asked you to compile it directly.

tsujan commented 10 months ago

Here no problem happened with the attached config file after hours of running Qps in the tray. The Qt style is also ruled out because I tested with Fusion and Kvantum.

The hidden factor doesn't seem to be related to configuration, graphics, Qt style, system idleness, resuming from suspension, or number of CPU cores. I found nothing suspicious in the code either. I've run out of ideas for now.

glu8716 commented 10 months ago

I'll try on another machine to see if the problem is also there

Bluey26 commented 10 months ago

problem

Looks like it happened to me again, i think its random. I have far less crashes than i used to do, but seems like the problem is still around. Is there some way to activate a 'debug' mode? to launch the program from the command line and see if there's some kind of error or debug info that may be helpful.

Version: 2.7.0(compiled from source code), 'using Qt library 5.15.10'

Bluey.

PS: i have 'exit on closing activated' so i may guess the problem is somewhere else. PS2:Could it be related to some kind of 'excessive amount of process' or 'heavy refreshing rate'? i have not changed those values, its just some idea i got.

tsujan commented 10 months ago

I have far less crashes than i used to do

Crashes are very important and shouldn't be ignored. If you encounter a crash, please attach its backtrace (see Arch wiki).

Is there some way to activate a 'debug' mode?

There's no "automatic debug mode". If a dev who is familiar with the code can reproduce it, he could track the issue by adding qDebug() to relevant places of the code (roughly speaking). The main problem is reproducing the issue. I wasn't able to do so, not even randomly.

Without reproducing the issue, the only hope is checking the code line by line, hoping that a suspicious line is found. That's not only extremely time-consuming but also prone to errors — believe it or not, sometimes your brain corrects a mistake in front of your eyes. I tried that a few times but found nothing; might try it again when I find the time...

Bluey26 commented 10 months ago

My bad, crash was the wrong word, i mean the bug happening, sorry @tsujan .

Yes, it will take a lot of time and resources to check every line.

Unfortunately i barely know about programming, so i cannot help in the debugging.

tsujan commented 10 months ago

@glu8716, @Bluey26 Could you please try https://github.com/lxqt/qps/pull/445 (without installation)? Its source is here: https://github.com/fastcat/qps/archive/refs/heads/remove-schedstat.zip

Bluey26 commented 10 months ago

I let it running a few hours to see if it happens again. I will report it in that case. Thanks.

Bluey26 commented 10 months ago

5 hours later, the bug did not happened, but i am not sure if its because the new build.

tsujan commented 10 months ago

Thanks for the test!

I guess only @glu8716 could tell us — he can reproduce the issue easily.

glu8716 commented 10 months ago

@glu8716, @Bluey26 Could you please try #445 (without installation)? Its source is here: https://github.com/fastcat/qps/archive/refs/heads/remove-schedstat.zip

I tried it and had the bug happening in just 5-10 minutes. It seemed to me even faster than the usual

tsujan commented 10 months ago

Thanks. So, this is also ruled out as the cause. Kernel version doesn't seem to play a role either (→ https://github.com/lxqt/qps/issues/251#issue-689404581).

tsujan commented 10 months ago

@glu8716 Please compile the attached source, run it in a terminal, and when the problem happened, paste the terminal text in a new comment to this page (or attach it as a zip file if it's too long).

qps-debug.zip

glu8716 commented 10 months ago

The output is a bunch of DEBUG: No status, DEBUG: No schedstat and DEBUG: No live process messages that repeat through the debug. I've just reported three as an example because they are a lot.

./qps
DEBUG: version = 60412
Qps(0x564410c37240, name="qps_main_window")
DEBUG: No status
DEBUG: No schedstat
DEBUG: No live process
tsujan commented 10 months ago

With your test, the possible causes have been narrowed down. So far, so good.

schedstat (DEBUG: No schedstat) is already covered by @fastcat's patch, https://github.com/lxqt/qps/pull/445. Now we need to know how it is possible that, suddenly, /proc/PID/status can't be read in your case (DEBUG: No status).

tsujan commented 10 months ago

@glu8716 I found a needle in the haystack and fixed it in https://github.com/lxqt/qps/pull/449. It may or may not be related to this. Would you please test it? Its source is here: https://github.com/lxqt/qps/archive/refs/heads/unsigned_int_negative.zip

Bluey26 commented 10 months ago

With 'qps-debug' i get the following messages in the command line:

DEBUG: No cmdline
DEBUG: No status

However, the process list is working for me.

I will test the last file too.

tsujan commented 10 months ago

However, the process list is working for me.

Yes, they don't result in an empty list by themselves.

glu8716 commented 10 months ago

Good news so far. I've had the program running for more than 5 hours now, and I still haven't had the bug. However I'll test it more tomorrow before jumping to conclusions. Thanks @tsujan !

tsujan commented 10 months ago

I've had the program running for more than 5 hours now, and I still haven't had the bug.

That's good news, indeed.

However I'll test it more tomorrow

Yes, please do so. You're the only user that can reproduce the issue easily.

glu8716 commented 10 months ago

Yeah, unfortunately it happened again, but this time I got no debug output in the terminal. 2023-09-01_12-03

tsujan commented 10 months ago

https://github.com/lxqt/qps/pull/449 was somehow related to this report (now merged), but I didn't find a causal relationship. So, not a surprise to me, although it may have had an effect.

Really sorry for bothering you with compilations! The attached source may shed more light on the situation — hopefully it'll be the last one. Please run it in a terminal and copy-paste the last lines (the 3 or 4 last lines will be enough) if you encounter an empty list.

qps-debug1.zip

glu8716 commented 10 months ago

No problem! I'm glad to help. This time I had the bug in just some minutes. This is the output:

./qps
DEBUG: version = 60412
Qps(0x564c1a9a1800, name="qps_main_window")
DEBUG: No live process. Refresh number is 1114
DEBUG: No live process. Refresh number is 1115
DEBUG: No live process. Refresh number is 1116
DEBUG: No live process. Refresh number is 1117
DEBUG: No live process. Refresh number is 1118
DEBUG: No live process. Refresh number is 1119
DEBUG: No live process. Refresh number is 1120
DEBUG: No live process. Refresh number is 1121
DEBUG: No live process. Refresh number is 1122
DEBUG: No live process. Refresh number is 1123
DEBUG: No live process. Refresh number is 1124
DEBUG: No live process. Refresh number is 1125
DEBUG: No live process. Refresh number is 1126
DEBUG: No live process. Refresh number is 1127
DEBUG: No live process. Refresh number is 1128
DEBUG: No live process. Refresh number is 1129
DEBUG: No live process. Refresh number is 1130
...

The refresh number keeps incrementing by 1 and goes up to 2831, which is the moment I found out the bug occured.

tsujan commented 10 months ago

OK, the refresh number is fine.

  1. Do you do something special when it happens?
  2. Do you use Docker?
  3. Is your system different from others in any way? A special kernel or setting? (Skip this question if the answer isn't obvious to you.)
tsujan commented 10 months ago

@Bluey26 Do you also have a Zen kernel? (Please take a look at the above questions.)

Bluey26 commented 10 months ago

Hi again. I have been testing the 'unsigned' package, and it does not bug to me, so i think its no longer happening to me.

About your questions, @tsujan :

tsujan commented 10 months ago

maybe i have some program that changes...

You could install any program, whether from the repositories or by self-compiling. I was searching for something that could change the behavior of the /proc directory and its subdirectories; hence asking about kernel.

Until now, @glu8716's debug output may show that, suddenly, /proc/<PID>/status can't be read, and so, the list is cleared. I didn't find any problem in the reading function — actually, I wan't able to find a problem in any related function either.

glu8716 commented 10 months ago

I have docker, yes. But while doing these tests there weren't any containers running, only the docker service. And I can't think of any particular operation I was doing. I have Thunderbird, Nicotine+, qBittorrent, KeePassXC, Anki, Audacity, Telegram, Flameshot, Firefox, PCManFM-qt and Tauon Music Box running. I also use most of the default LXQt stuff. I use s6 as the system's init, but this shouldn't be a problem as @Bluey26 uses Systemd (I guess, since he's on Arch). And I'm running linux-zen. I have two scripts on the panel that calculate the RAM usage and the CPU temperature in real time, don't know if these could conflict or not, but I'm adding them to the list just in case. I can't really think of anything else...

tsujan commented 10 months ago

I have docker

As for Docker, my reason for asking was https://github.com/lxqt/qps/issues/444; otherwise, I know nothing about it.

Are you able to open the files /proc/<PID>/status with a text editor (like FeatherPad) when the problem starts? You could use pcmanfm-qt, go inside /proc, and click /proc/<PID>/status for a few values of <PID>, e.g., by starting from the highest one. If yes, do they contain texts similar to this?

Name:   QtWebEngineProc
Umask:  0022
State:  S (sleeping)
Tgid:   100056
Ngid:   0
Pid:    100056
PPid:   99457
TracerPid:  0
...
glu8716 commented 10 months ago

Yes, they do.

/proc/30959/status

Name:   nvidia
Umask:  0022
State:  S (sleeping)
Tgid:   30959
Ngid:   0
Pid:    30959
PPid:   2
TracerPid:  0
Uid:    0   0   0   0
Gid:    0   0   0   0
FDSize: 64
Groups:  
NStgid: 30959
NSpid:  30959
NSpgid: 0
NSsid:  0
Kthread:    1
Threads:    1
SigQ:   2/63722
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: ffffffffffffffff
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp:    0
Seccomp_filters:    0
Speculation_Store_Bypass:   vulnerable
SpeculationIndirectBranch:  always enabled
Cpus_allowed:   f
Cpus_allowed_list:  0-3
Mems_allowed:   00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:    2
nonvoluntary_ctxt_switches: 0

/proc/30958/status

Name:   irq/33-mei_me
Umask:  0022
State:  S (sleeping)
Tgid:   30958
Ngid:   0
Pid:    30958
PPid:   2
TracerPid:  0
Uid:    0   0   0   0
Gid:    0   0   0   0
FDSize: 64
Groups:  
NStgid: 30958
NSpid:  30958
NSpgid: 0
NSsid:  0
Kthread:    1
Threads:    1
SigQ:   2/63722
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: ffffffffffffffff
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp:    0
Seccomp_filters:    0
Speculation_Store_Bypass:   vulnerable
SpeculationIndirectBranch:  always enabled
Cpus_allowed:   2
Cpus_allowed_list:  1
Mems_allowed:   00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:    27
nonvoluntary_ctxt_switches: 1

/proc/53/status

Umask:  0022
State:  I (idle)
Tgid:   53
Ngid:   0
Pid:    53
PPid:   2
TracerPid:  0
Uid:    0   0   0   0
Gid:    0   0   0   0
FDSize: 64
Groups:  
NStgid: 53
NSpid:  53
NSpgid: 0
NSsid:  0
Kthread:    1
Threads:    1
SigQ:   2/63722
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: ffffffffffffffff
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp:    0
Seccomp_filters:    0
Speculation_Store_Bypass:   vulnerable
SpeculationIndirectBranch:  always enabled
Cpus_allowed:   f
Cpus_allowed_list:  0-3
Mems_allowed:   00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:    2
nonvoluntary_ctxt_switches: 0
tsujan commented 10 months ago

Very good, and at the same time, quite baffling, because of DEBUG: No status!

tsujan commented 10 months ago

For now, I quit because, after spending a lot of time on this, I could neither reproduce it nor find anything wrong in the code — yes, the coding style of the original author may be strange or even untidy, but I found no mistake in the functions related to the current issue.

Here is some info for devs who may want to investigate it. It's partly deduced from the kind cooperation of the reporters.

Qps considers a process to be nonexistent and ignores it (near the end of Proc::refresh()) when either its PID dir doesn't exist in /proc anymore, or one of the following files cannot be read inside /proc/<PID>, where <PID> is the PID of the process in question: status, cmdline, stat, statm. The provided info implies that, suddenly and randomly, this happens for all processes. In particular, there may be a problem in reading /proc/<PID>/status, although the last comment by @glu8716 seems to contradict this hypothesis.

tsujan commented 8 months ago

@glu8716, @Bluey26 What are the outputs of ulimit -Sn and ulimit -Hn for you?

Bluey26 commented 8 months ago

Hello @tsujan ulimit -Sn drops: 1024

ulimit -Hn drops: 524288

tsujan commented 8 months ago

Hello @Bluey26

They're normal.

You don't get the error message "Too many open files"; do you? If you do, is there any process that may open files constantly without closing them on your system? See this article, which explains how to find it: https://www.howtogeek.com/805629/too-many-open-files-linux/

The idea came from https://github.com/lxqt/qps/issues/251#issuecomment-1789258863

Bluey26 commented 8 months ago

I do not recall receiving that error.

But running the command given to 'inspect' in the URL you provided shows me that for example, firefox or mpv uses more than the soft limit (circa 20 000 for mpv and circa 40 000 for firefox). Should i increase those limits? (they are below my hard limit, and i have not seen the error so far).

In the same line, can a system upgrade surpass or 'require' to shut down the 'file using' of other processess, like qps, and then this cause for example the blank list? I imagine that upgrading the system uses a lot of files in the process.

tsujan commented 8 months ago

Should i increase those limits?

I wouldn't if I were you. Those numbers seem quite normal to me. And you said you didn't get "Too many open files".

In the same line, can a system upgrade surpass or 'require' to shut down the 'file using' of other processess, like qps, and then this cause for example the blank list?

I don't think so, but I'm not sure.

It seems that https://github.com/lxqt/qps/issues/251#issuecomment-1789258863 is about something different.