uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

RuntimeWarning when using pure Python reader with process workers #585

Closed filipski closed 2 years ago

filipski commented 4 years ago

I noticed the following warnings while running benchmarks with pure Pyton reader using process workers (code snippet available in https://github.com/uber/petastorm/issues/584):

/home/dfilipsk/miniconda3/envs/petastorm/lib/python3.6/runpy.py:125: RuntimeWarning: 'petastorm.workers_pool.exec_in_new_process' found in sys.modules after import of package 'petastorm.workers_pool', but prior to execution of 'petastorm.workers_pool.exec_in_new_process'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))

This warning is repeated as many times as the amount of workers in the pool.

dmcguire81 commented 3 years ago

@selitvin we see this problem, too - any idea what causes this?

drelyea-blackberry commented 3 years ago

This is my single greatest annoyance with Petastorm. It forces us to run_training 2> errors.txt just to hide this.

amir-abdi commented 2 years ago

It is actually causing our application to crash. Because some modules are not imported when the worker is initialzied, some modules, which we are using withing the worker's workflow, are not available for the sub-process, and the application throws an exception.

How shall we fix that?

selitvin commented 2 years ago

I can take a look at this later this week. This is something regarding the way we spawn a process-pool worker and the way unpickling works. Feel free to poke that piece of the code if you need it sooner.

carlosfrutos commented 2 years ago

Could have a look at it? I also makes our application crash.

Thank you

selitvin commented 2 years ago

petastorm==0.11.4 with the fix is now available on pypi