Closed c66cedae-26a6-44fd-a569-d31a302229ef closed 12 months ago
This is related to https://bugs.python.org/issue17797, which is closed.
Using Python 3.7.4, Windows 10.0.18362, Visual Studio 2017 and running as a C Application. Py_Initialize() eventually calls is_valid_fd with STDIN. The behavior appears to cause both dup() and fstat() to hang indefinitely (using RELEASE MSVCRT DLLs, it works correctly using MSVCRT Debug DLLs). The call stack shows Windows is waiting for some Windows Event. The recommended patch in bpo-17797 will not work.
is_valid_fd appears to want to read the 'input' using a file descriptor. since both dup and fstat hang, I realized that isatty() would indicate if the file descriptor is valid and works for any predefined FD descriptor(STDIN-0, STDOUT-1, STDERR-2).
#if defined(MS_WINDOWS)
struct stat buf;
if (fd >= fileno(stdin) && fd <= fileno(stderr)) {
return (_isatty(fd) == 0 && errno == EBADF) ? 0 : 1;
}
else if (fstat(fd, &buf) < 0 && (errno == EBADF || errno == ENOENT))
return 0;
return 1;
#else
Are you able to capture a process dump at the hang? I haven't seen this anywhere else, and don't even know how to go about trying to reproduce it with this information - Py_Initialize is called by every single Python process, so there's something special about your situation that isn't obvious yet :)
Personally, I have the same problem of Py_Initialize() hanging indefinitely.
Here is the context in which it happens : I am developing an application in Java, in which I use the library jep (https://github.com/ninia/jep), that enables me to get a Python interpreter from Java, and I am developing and testing it on Windows. My Python version is 3.8.2 and I am on Windows 10 - version 1903. When I test this library outside my app in a simple Java project, everything works fine, and the interpreter works. But when I try to use it in the app, it hangs indefinitely when I create the interpreter. When I digged into the code of the library, I found out that it occurs in the native code of jep, during the call to Py_Initialize(). I posted an issue on the github of jep, and they brought me here. I bet this is related to stdin and stdout when I see what dhamilton posted. My Java's stdout is normal and writes in the console. I tried to reset or redirect Java's stdin an stdout, but it doesn't change nothing.
And when I try to do this on Linux (my application is also on Linux), on Ubuntu 16, everything works fine and it doesn't hang indefinitely. So this only happens on Windows.
About capturing a process dump, all I can get is a message displayed on the Java console when I close the app (because it hangs indefinitely) : # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000302b9d8f, pid=11960, tid=0x0000000000003f98 # # JRE version: Java(TM) SE Runtime Environment (8.0_241-b07) (build 1.8.0_241-b07) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.241-b07 mixed mode windows-amd64 compressed oops) # Problematic frame: # C 0x00000000302b9d8f # # Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
# The crash happened outside the Java Virtual Machine in native code.
All of this is just my personal case, and it's probably not the same for dhamilton. I hope it helped you.
I can reproduce this on Windows 10 with Python 3.9. See attached source. At least for us, it is hanging when one thread is doing a read on the file descriptor while a second calls Py_Initialize (or just dup directly).
The windows kernel call stack shows the dup call is waiting on a critical section, while the thread reading from stdin is waiting in ReadFile. I can get a full stack trace from WinDbg if it is helpful, but hopefully the attached code should be enough to reproduce the problem at will for anyone interested.
If stdin is receiving input, or is closed, then the read call will complete and unblock dup in due course. However if not then it will hang indefinitely.
If we can fix this to work reliably in Python that would be great. Otherwise, or in the meantime, we could just add a note to the documentation. We are going to try and work-around it by using a different file descriptor instead of stdin. Other applications might be able to avoid IO using stdin until after python is initialised.
This problem still exists on Python3.10.9, when I use Py_NewInterpreter to create new sub-interpreters for my embedded python. It is the same problem as Py_Initialize, because it finally calls Py_NewInterpreter.
See more at https://stackoverflow.com/questions/71892914/python-c-api-py-newinterpreter-freezes-thread-when-creating-new-sys-stdin Above is the same problem.
Only occurs on windows platform
- Start a blocking IO of stdin in a thread (for example,
scanf("%d", &num)
)- Call Py_NewInterpreter in another thread
- The thread calling Py_NewInterpreter will freeze, until I input a new line into console and press enter.
- Then the program continues to work properly (the sub-intepreter created successfully).
It is obvious that stdin is waiting for some input after calling Py_NewInterpreter. I have tested in VS and the call blocks at dup()
when creating new sys.stdin for new sub-interpreter.
Pylifecycle.c
, line 2124I got a temporary solution: start a new thread to execute Win32 API CancelIoEx on stdin
before calling Py_NewInterpreter, which looks like this:
std::thread([](){
Sleep(50);
CancelIoEx(GetStdHandle(STD_INPUT_HANDLE), NULL);
}).detach();
But this solution not seems good, as we don't know exactly when dup() happens to block after the execution of Py_NewInterpreter. Also it need to call Win32 APIs, which is not suitable for projects like CPython.
So, is there a bette way to do something like first temporarily ban stdin before dup(), and then recover it when the calling is finished?
I think @eryksun just pointed this same problem out in #102765, so we may need to factor it into this code path as well.
Though I wonder if in this case we'd be better off with a way to preemptively set up stdio for Python, rather than trying to infer it? That could bypass this code entirely, and generally help embedders provide their own read/write functions.
As a rule, we shouldn't require use of global resources in CPython when embedding. Right now, we're pretty bad about that, but it's the direction we'd want to move in. Allowing host apps to completely handle std streams would fit.
I think @eryksun just pointed this same problem out in #102765, so we may need to factor it into this code path as well.
Though I wonder if in this case we'd be better off with a way to preemptively set up stdio for Python, rather than trying to infer it? That could bypass this code entirely, and generally help embedders provide their own read/write functions.
As a rule, we shouldn't require use of global resources in CPython when embedding. Right now, we're pretty bad about that, but it's the direction we'd want to move in. Allowing host apps to completely handle std streams would fit.
Thanks for your remind. Hopefully Python can make more improvements for embedding in the future. I know that a number of useful improvements will be made to the sub-interpreter in Python 3.12, including splitting the GIL and so on. I'm sure these efforts will work.
I'm trying to see if I can avoid blocking by modifying stdio beforehand. If there is any success I'll post it here
The implementation of is_valid_fd()
was made faster and generally safer in Python 3.11:
GetFileType()
calls NtQueryVolumeInformationFile()
to get the file's FileFsDeviceInformation
. If the file object is a direct, local open, then the I/O manager implements this query without having to call the filesystem driver and synchronize on the file object. On the other hand, it does have to call the driver and synchronize on the file if it's opened on a redirected filesystem (i.e. FILE_DEVICE_NETWORK_FILE_SYSTEM
). That's only a concern in general for a remote pipe or mailslot. The chances of using a remote pipe for standard I/O are slim to none.
Note that there are still problems with lseek()
calls. For example, the following hangs Python at startup until the console read in the parent process is completed (i.e. until enter is pressed).
>>> import os, subprocess, threading
>>> env = os.environ.copy()
>>> env['PYTHONLEGACYWINDOWSSTDIO'] = '1'
>>> th = threading.Thread(target=input)
>>> th.start(); p = subprocess.Popen('python', env=env); th.join(); p.wait()
It's easy to see why using an attached debugger:
0:000> kc 8
Call Site
ntdll!NtQueryInformationFile
KERNELBASE!SetFilePointerEx
ucrtbase!common_lseek_nolock<__int64>
ucrtbase!common_lseek<__int64>
ucrtbase!lseeki64
python311!portable_lseek
python311!_io_FileIO_tell_impl
python311!_io_FileIO_tell
The child inherits its console input file from the parent. Since the parent is doing a synchronized read on this file, the NtQueryInformationFile()
call in the child hangs until it can acquire the file lock.
The I/O manager allows setting the file pointer of a synchronous pipe or console file, but it's meaningless. We could protect lseek()
calls to either do nothing or fail if the file type isn't FILE_TYPE_DISK
.
So it sounds like this hang is no longer an issue on 3.11? Can anyone suffering from the problem confirm?
So it sounds like this hang is no longer an issue on 3.11? Can anyone suffering from the problem confirm?
No complaints in seven months: let's assume it's now fine, and we can re-open again if needed.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.7', 'type-bug', 'library', 'OS-windows']
title = 'Py_Initialize Hangs on Windows 10'
updated_at =
user = 'https://bugs.python.org/dhamilton'
```
bugs.python.org fields:
```python
activity =
actor = 'duaneg'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)', 'Windows']
creation =
creator = 'dhamilton'
dependencies = []
files = ['50099']
hgrepos = []
issue_num = 39345
keywords = []
message_count = 4.0
messages = ['360070', '375628', '375639', '395377']
nosy_count = 7.0
nosy_names = ['paul.moore', 'tim.golden', 'duaneg', 'zach.ware', 'steve.dower', 'dhamilton', 'ph.fieschi']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue39345'
versions = ['Python 3.7']
```