jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.23k stars 330 forks source link

When running statcast(), getting: concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. #211

Closed aa452110 closed 3 years ago

aa452110 commented 3 years ago

First, thanks for all you do. Second, I'm a noob so I apologize. I've tried my darndest to solve the issue here.

In short, every function on pybaseball works great for me except the statcast(). (Statcast_batter() and statcast_pitcher(), for example, all work great.)

I use PyCharm on PC. I teach high school students python and would really like to add the pybaseball package as part of my curriculum.

But any statcast() call gives me the same error, no matter what I try. See below.

If you have any help or links or documentation I can read through to troubleshoot a solve it would be much appreciated.

Thank you. Brian

in: statcast("2021-04-28")

out:

This is a large query, it may take a moment to complete
  0%|          | 0/1 [00:00<?, ?it/s]
This is a large query, it may take a moment to complete
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\brian\PycharmProjects\pythonProject\pyBaseball\main.py", line 9, in <module>
    statcast("2021-04-28")
  File "C:\Users\brian\PycharmProjects\pythonProject\pyBaseball\venv\lib\site-packages\pybaseball\statcast.py", line 110, in statcast
    return _handle_request(start_dt_date, end_dt_date, 1, verbose=verbose,
  File "C:\Users\brian\PycharmProjects\pythonProject\pyBaseball\venv\lib\site-packages\pybaseball\statcast.py", line 70, in _handle_request
    futures = {executor.submit(_small_request, subq_start, subq_end, team=team)
  File "C:\Users\brian\PycharmProjects\pythonProject\pyBaseball\venv\lib\site-packages\pybaseball\statcast.py", line 70, in <setcomp>
    futures = {executor.submit(_small_request, subq_start, subq_end, team=team)
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\process.py", line 697, in submit
    self._adjust_process_count()
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\process.py", line 675, in _adjust_process_count
    p.start()
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
  0%|          | 0/1 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\brian\PycharmProjects\pythonProject\pyBaseball\main.py", line 9, in <module>
    statcast("2021-04-28")
  File "C:\Users\brian\PycharmProjects\pythonProject\pyBaseball\venv\lib\site-packages\pybaseball\statcast.py", line 110, in statcast
    return _handle_request(start_dt_date, end_dt_date, 1, verbose=verbose,
  File "C:\Users\brian\PycharmProjects\pythonProject\pyBaseball\venv\lib\site-packages\pybaseball\statcast.py", line 73, in _handle_request
    dataframe_list.append(future.result())
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\_base.py", line 438, in result
    return self.__get_result()
  File "C:\Users\brian\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\_base.py", line 390, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Process finished with exit code 1
schorrm commented 3 years ago

I hate to say this but it works just fine on my machine (I checked Python 3.8.5 on Linux and 3.9.1 on Windows). Can you give some more information on your Python environment? Can you check if the bug persists in ipython or another non-PyCharm environment?

aa452110 commented 3 years ago

Thanks for the reply. I've now tried a mac, pc, and Linux(ubuntu). I'm running Python 3.9 on each and using PyCharm on each as well.

I get the same error on the Mac and PC (I've tried multiple PCs).

It does work, however, on my ubuntu setup.

I'm not sure what the issue, obviously. It looks like some sort of time-out issue when it goes to grab the statcast data that, for whatever reason, Ubuntu/Linux does not have a problem with.

aa452110 commented 3 years ago

Well, I solved it on my windows setups in PyCharm. I have not tried the same thing on my Mac setup, but I'm hopeful that would solve it too. Linux/Ubuntu just didn't need it.

I was just searching through StackOverflow here: https://stackoverflow.com/questions/15900366/all-example-concurrent-futures-code-is-failing-with-brokenprocesspool

I saw someone use a comment saying something about windows and using:

if name == 'main':

so, I tried this:

from pybaseball import statcast

if __name__ == '__main__':
    stats = statcast()
    stats.to_csv("yesterdays_statcast_data.cvs")

I've got no idea why that worked. But it did. Now when I run any statcast() function it pulls the data down just fine.

So, great day!

tjburch commented 3 years ago

That's... wildly annoying, especially that it's only on windows. Not sure if there's a sensible way to protect this from the package level, maybe it's something we should consider adding a warning on though?

schorrm commented 3 years ago

I'm closing this for now as we can't chase down every issue with Windows demons