voxel51 / fiftyone

The open-source tool for building high-quality datasets and computer vision models
https://fiftyone.ai
Apache License 2.0
8.13k stars 542 forks source link

Open-images-v6 multiprocessing issue on windows 10 if__name__ == '__main__': python 3.8.10 & 3.9.5 & 3.7.8 #1119

Open Hunterk720 opened 3 years ago

Hunterk720 commented 3 years ago

Attempting to download anything from the open images v6 database repeatedly throws an exception within spawn.py

def _check_not_importing_main(): if getattr(process.current_process(), '_inheriting', False): raise RuntimeError(''' An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

    if __name__ == '__main__':
        freeze_support()
        ...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.''')

Browser is microsoft edge, however it never gets that far. Other datasets load fine, only the open image v6 fails

Here is my full code:

import fiftyone as fo import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset( "open-images-v6", "validation", label_types=["detections", "classifications"], classes = ["Dog", "Cat"], max_samples=20, seed=51, shuffle=True, dataset_name="open-images-dog-cat", )

session = fo.launch_app(dataset)

Windows 10 fiftyone v0.11.1 installed from pip

Commands to reproduce

Install fresh python create new venv install fresh fiftyone from pip

Other info / logs

RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Downloading 20 images 0% ||------------------------------------------------------------------------------------------------------------------------------------------------------------------| 0/20 [1.1ms elapsed, ? remaining, ? samples/s] Uncaught exception Traceback (most recent call last): File "", line 1, in File "C:\src\computeEngine\training\python\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "C:\src\computeEngine\training\python\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "C:\src\computeEngine\training\python\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\src\computeEngine\training\python\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main") File "C:\src\computeEngine\training\python\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "C:\src\computeEngine\training\python\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "C:\src\computeEngine\training\python\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\src\computeEngine\training\fo_test.py", line 12, in dataset_name="open-images-dog-cat", File "C:\src\computeEngine\training\env\lib\site-packages\fiftyone\zoo\datasets\init__.py", line 201, in load_zoo_dataset **download_kwargs, File "C:\src\computeEngine\training\env\lib\site-packages\fiftyone\zoo\datasets\init.py", line 121, in download_zoo_dataset cleanup=cleanup, File "C:\src\computeEngine\training\env\lib\site-packages\fiftyone\zoo\datasets\init.py", line 1002, in download_and_prepare ) = self._download_and_prepare(split_dir, scratch_dir, split) File "C:\src\computeEngine\training\env\lib\site-packages\fiftyone\zoo\datasets\base.py", line 1313, in _download_and_prepare max_samples=self.max_samples, File "C:\src\computeEngine\training\env\lib\site-packages\fiftyone\utils\openimages.py", line 598, in download_open_images_split download=True, File "C:\src\computeEngine\training\env\lib\site-packages\fiftyone\utils\openimages.py", line 669, in _download_open_images_split download=download, File "C:\src\computeEngine\training\env\lib\site-packages\fiftyone\utils\openimages.py", line 1367, in _download download=download, File "C:\src\computeEngine\training\env\lib\site-packages\fiftyone\utils\openimages.py", line 1667, in _download_images_if_necessary with multiprocessing.Pool(num_workers, _initialize_worker) as pool: File "C:\src\computeEngine\training\python\lib\multiprocessing\context.py", line 119, in Pool context=self.get_context()) File "C:\src\computeEngine\training\python\lib\multiprocessing\pool.py", line 176, in init self._repopulate_pool() File "C:\src\computeEngine\training\python\lib\multiprocessing\pool.py", line 241, in _repopulate_pool w.start() File "C:\src\computeEngine\training\python\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\src\computeEngine\training\python\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\src\computeEngine\training\python\lib\multiprocessing\popen_spawn_win32.py", line 46, in init__ prep_data = spawn.get_preparation_data(process_obj._name) File "C:\src\computeEngine\training\python\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "C:\src\computeEngine\training\python\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main is not going to be frozen to produce an executable.''')

What areas of FiftyOne does this bug affect?

brimoor commented 3 years ago

Hi @Hunterk720, thanks for reporting this.

We'll take a look at resolving it, but, in the meantime, you should be able to bypass this issue by passing num_workers=1 so that multiprocessing is not used to download images:

import fiftyone as fo
import fiftyone.zoo as foz

# downloads images without using multiprocessing; no issues
dataset = foz.load_zoo_dataset(
    "open-images-v6",
    split="validation",
    label_types=["detections", "classifications"],
    classes = ["Dog", "Cat"],
    max_samples=20,
    seed=51,
    shuffle=True,
    num_workers=1,
    dataset_name="open-images-dog-cat",
)

session = fo.launch_app(dataset)
brimoor commented 3 years ago

As the error message you showed suggests, you can probably also get the first syntax to work by doing the following:

import fiftyone as fo
import fiftyone.zoo as foz

# necessary on Windows, since multiprocessing is used by default
# See https://stackoverflow.com/a/18205006 for details
if __name__ == "__main__":

    dataset = foz.load_zoo_dataset(
        "open-images-v6",
        split="validation",
        label_types=["detections", "classifications"],
        classes = ["Dog", "Cat"],
        max_samples=20,
        seed=51,
        shuffle=True,
        dataset_name="open-images-dog-cat",
    )

session = fo.launch_app(dataset)