google / yapf

A formatter for Python files
Apache License 2.0
13.75k stars 887 forks source link

[Bug] [Crash][Reproducible] `EOFError: Ran out of input` when import yapf with multiprocess #1204

Open whlook opened 7 months ago

whlook commented 7 months ago

previous similar issue #1164

Problems

image

Reproduction

  1. yapf==0.40.2
  2. rm -r ~/.cache/YAPF
  3. run this python code:

    
    import multiprocessing
    
    def proc():
     import yapf
    
    if __name__ == "__main__":
     pp=[]
     for i in range(100):
         p = multiprocessing.Process(target=proc)
         pp.append(p)
    
     for p in pp:
         p.start()
    
     for p in pp:
         p.join()


## Reason
the problem code is here:https://github.com/google/yapf/blob/c0908d043b45f082e6339cd767c28e6d697cb22e/third_party/yapf_third_party/_ylib2to3/pgen2/driver.py#L247

When in multi process enviroment,some process will create grammar cache(when ~/.cache/YAPF not exists), before the process write to the cache file, other process will see the file exist and load it, but the cache file is empty right now , so `EOFError` will raise and the program crashed.

## How to fix it?

- I think we need to add `try` here(https://github.com/google/yapf/blob/c0908d043b45f082e6339cd767c28e6d697cb22e/third_party/yapf_third_party/_ylib2to3/pgen2/driver.py#L247) like we had add `try` when we create the cache.
- Or we can add a  multiprocess.Lock here
- Or we can do `python -c 'import yapf'` before we start our multiprocess program.(for users,trick)
hartwork commented 2 days ago

@whlook that's a great report, in particular the reproducer is very handy — thank you!

I have a pull request coming up for this — should be auto-linked below in a minute or two —, and I would like to share my extended version of your reproducer below:

import multiprocessing

def cleanup():
    import os
    import yapf
    from yapf_third_party._ylib2to3.pgen2.driver import _generate_pickle_name
    for i in ("Grammar.txt", "PatternGrammar.txt"):
        filename = _generate_pickle_name(i)
        print(f"  Removing {filename}...")
        os.remove(_generate_pickle_name(i))

def proc():
    import yapf

if __name__ == "__main__":
    max_parallelity = 30
    for parallelity in range(2, max_parallelity + 1):
        print("Cleaning up...")
        cleanup_p = multiprocessing.Process(target=cleanup)
        cleanup_p.start()
        cleanup_p.join()

        print(f"Testing for the race condition with {parallelity} processes...")
        pp = []
        for i in range(parallelity):
            p = multiprocessing.Process(target=proc)
            pp.append(p)

        for p in pp:
            p.start()

        for p in pp:
            p.join()

        if any(p.exitcode != 0 for p in pp):
            print(f"Done, race condition proven above, took {parallelity} processes.")
            break
    else:
        print("Done, no luck crashing this time.")