kiranvizru / psutil

Automatically exported from code.google.com/p/psutil
Other
0 stars 0 forks source link

get_children() sometimes contains non-children of a process #314

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
(You cannot guarantee it will be reproduced every time)
1. Create lots of processes (hundreds, maybe a thousand)
2. Run psutil.Process(pid).get_children() for all of them.

What is the expected output?
The actual children should be returned.

What do you see instead?
Some of the processes end up having "children" that are not really their 
children.

What version of psutil are you using? What Python version?
psutil-0.5, psutil-0.4.0, python-2.6.1

On what operating system? Is it 32bit or 64bit version?
64-bit Windows 2008

Please provide any additional information below.

This happens because when you scan the table you are only checking for match on:
p.ppid == self.pid when going through the process table.
PIDs can be reused and a process could have a ppid that's already dead.

Instead I think you should also check that a parent's creation time is less 
than or equal to the given process.

Original issue reported on code.google.com by stanchev.emil@gmail.com on 6 Aug 2012 at 9:49

Attachments:

GoogleCodeExporter commented 9 years ago
Adding patch that also covers the recursive version.

Original comment by stanchev.emil@gmail.com on 6 Aug 2012 at 10:33

Attachments:

GoogleCodeExporter commented 9 years ago
Hmmm I'm not sure I'm following you.
Are you reporting this because it actually happened or is it just theoretical?
Please note that when we iterate through all processes we already make sure 
every process PID has not been reused:
https://code.google.com/p/psutil/source/browse/tags/release-0.5.1/psutil/__init_
_.py#763
...and this check is automatically inherited by get_children().
Perhaps you can provide a test code?

Original comment by g.rodola on 6 Aug 2012 at 11:57

GoogleCodeExporter commented 9 years ago
It actually happened.

I think the code you pointed out handles the case where the PID is reused 
between calls to process_iter() ?

I am talking about something different. Example from my system:

1) explorer.exe  has a PPID of 3948. There is no process with PID 3948 running.
2) I start a process X. It happens to reuse PID 3948.
3) X.get_children() returns explorer.exe as a child, which is obviously wrong.

Let me know if you need more information.

Original comment by stanchev.emil@gmail.com on 6 Aug 2012 at 12:10

GoogleCodeExporter commented 9 years ago
Attaching a reproduce script for windows.
I think this bug does not apply to linux, as an orphaned process gets adopted 
by init, so ppid cannot point to a dead process?

Please beware it creates a lot of 'cmd' processes on the machine. It should 
cleanup on ctrl-c.
Here's the tail of my example run on the windows 2008 machine using psutil-0.4:

329 (new PID=4200)
psutil.Process(pid=4200, name='cmd.exe') is not really a parent of 
psutil.Process(pid=4252, name='NetTime.exe')
psp.create_time <= c.create_time == False

Obviously NetTime.exe was not started by the script and also the processes 
started by the script do not have any children at all.

As you can see from the "False" value here, if you put the check about 
create_time this bug will not be happening.

Original comment by stanchev.emil@gmail.com on 6 Aug 2012 at 12:44

Attachments:

GoogleCodeExporter commented 9 years ago
Ok, I get it now, and you're right: we should skip all children which appears 
to be older than their parents, meaning their PID has been reused.
This should now be fixed as r1503.
At the moment I don't have a Windows box to test against though.
Can you try reproduce_children_bug.py before and after r1503 to make sure the 
problem is fixed?

Original comment by g.rodola on 6 Aug 2012 at 12:55

GoogleCodeExporter commented 9 years ago
Verified with 5 runs of the test script: 5 times it failed on @r1502, 5 times 
it finished succesfully (creating 1000 processes) @r1503.

Thanks for the quick fix!

Original comment by stanchev.emil@gmail.com on 6 Aug 2012 at 1:55

GoogleCodeExporter commented 9 years ago
Great! Thanks for verifying.

Original comment by g.rodola on 6 Aug 2012 at 7:34

GoogleCodeExporter commented 9 years ago
Fixed in version 0.6.0, released just now.

Original comment by g.rodola on 13 Aug 2012 at 4:25

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Updated csets after the SVN -> Mercurial migration:
r1502 == revision ???
r1503 == revision ???

Original comment by g.rodola on 2 Mar 2013 at 12:12