jfowkes / pycutest

Python interface to CUTEst
https://jfowkes.github.io/pycutest/
GNU General Public License v3.0
28 stars 11 forks source link

Compilation in parallel on macOS does not work #74

Open ragonneau opened 9 months ago

ragonneau commented 9 months ago

Describe the bug Although things work smoothly on Ubuntu, the compilation of problems cannot be done in parallel on macOS.

To Reproduce I tried to run the following code:

import logging

import pycutest
from joblib import Parallel, delayed

def get_logger(name):
    logger = logging.getLogger(__name__)
    if len(logger.handlers) == 0:
        logger.setLevel(logging.INFO)

        # Attach a console handler (thread-safe).
        handler = logging.StreamHandler()
        handler.setFormatter(logging.Formatter('[%(levelname)-8s] %(message)s'))
        logger.addHandler(handler)
    return logger

@delayed
def load_problem(problem_name):
    logger = get_logger(__name__)
    logger.info(f'Loading {problem_name}')
    try:
        pycutest.import_problem(problem_name)
        logger.info(f'{problem_name} successfully loaded')
    except Exception as exc:
        logger.warning(f'{problem_name} failed to load: {exc}')

if __name__ == '__main__':
    problem_names = pycutest.find_problems(constraints='unconstrained', n=[1, 10])
    Parallel(n_jobs=-1)(load_problem(problem_name) for problem_name in problem_names[:5])

This code attempts to load five problems. I assume here that PYCUTEST_CACHE is set to a location where the problems have not already been compiled. I get the following output:

[INFO    ] Loading SISSER
[INFO    ] Loading PALMER5C
[INFO    ] Loading CHWIRUT2LS
[INFO    ] Loading BOXPOWER
[INFO    ] Loading DENSCHNB
Fatal Python error: init_sys_streams: can't initialize sys standard streams
Python runtime state: core initialized
OSError: [Errno 9] Bad file descriptor

Current thread 0x00007ff8541ebb80 (most recent call first):
  <no Python frame>
[WARNING ] SISSER failed to load: Failed to build the Python interface module
Fatal Python error: init_sys_streams: can't initialize sys standard streams
Python runtime state: core initialized
OSError: [Errno 9] Bad file descriptor

Current thread 0x00007ff8541ebb80 (most recent call first):
  <no Python frame>
Fatal Python error: init_sys_streams: can't initialize sys standard streams
Python runtime state: core initialized
OSError: [Errno 9] Bad file descriptor

Current thread 0x00007ff8541ebb80 (most recent call first):
  <no Python frame>
[WARNING ] DENSCHNB failed to load: Failed to build the Python interface module
[WARNING ] CHWIRUT2LS failed to load: Failed to build the Python interface module
Fatal Python error: init_sys_streams: can't initialize sys standard streams
Python runtime state: core initialized
OSError: [Errno 9] Bad file descriptor

Current thread 0x00007ff8541ebb80 (most recent call first):
  <no Python frame>
[WARNING ] PALMER5C failed to load: Failed to build the Python interface module
Fatal Python error: init_sys_streams: can't initialize sys standard streams
Python runtime state: core initialized
OSError: [Errno 9] Bad file descriptor

Current thread 0x00007ff8541ebb80 (most recent call first):
  <no Python frame>
[WARNING ] BOXPOWER failed to load: Failed to build the Python interface module

What is strange to me is that this code works correctly on Ubuntu.

Information about your installation:

jfowkes commented 9 months ago

Hmm this looks suspiciously like an upstream issue to me:

Fatal Python error: init_sys_streams: can't initialize sys standard streams
Python runtime state: core initialized
OSError: [Errno 9] Bad file descriptor

Could you also try parallel compilation on macOS with another Python extension?

EDIT: also probably worth trying with a newer Python version (11/12).

ragonneau commented 9 months ago

Hi Jari @jfowkes,

I made some tests, and the same problem occurred for Python3.11 and Python3.12. However, it seems that the following code works:

import logging
import multiprocessing

import pycutest

def get_logger(name):
    logger = logging.getLogger(__name__)
    if len(logger.handlers) == 0:
        logger.setLevel(logging.INFO)

        # Attach a console handler (thread-safe).
        handler = logging.StreamHandler()
        handler.setFormatter(logging.Formatter('[%(levelname)-8s] %(message)s'))
        logger.addHandler(handler)
    return logger

def load_problem(problem_name):
    logger = get_logger(__name__)
    logger.info(f'Loading {problem_name}')
    try:
        pycutest.import_problem(problem_name)
        logger.info(f'{problem_name} successfully loaded')
    except Exception as exc:
        logger.warning(f'{problem_name} failed to load: {exc}')

if __name__ == '__main__':
    problem_names = pycutest.find_problems(constraints='unconstrained', n=[1, 10])
    with multiprocessing.Pool() as p:
        p.map(load_problem, problem_names[:5])

I tried to change the name of all the Python files in PyCUTEst (I was particularly worried that system_paths might conflict with some other module), but this did not do the trick. I am not sure what is happening.

Cheers, Tom.

jfowkes commented 9 months ago

Thanks Tom,

Glad you managed to get it working with multiprocessing (my python parallel framework of choice). The joblib error you're seeing is similar to a recent upstream bug report: https://github.com/joblib/joblib/issues/1529#issuecomment-1857530770 The joblib developers are claiming it's nothing to do with them but that is unlikely, it looks to me like a relatively rare genuine upstream bug (the fact that you and someone else on a different platform both experience a very similar issue is too much of a coincidence).

LacombeLouis commented 6 months ago

Hey @jfowkes @ragonneau, Any news/updates the topic? We are having a similar issue on our library! Thank you!

jfowkes commented 6 months ago

Hi @LacombeLouis, this is an upstream issue as far as we are concerned. As a workaround I would recommend switching over to multiprocessing.