tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

*bug* t2t-datagen freezes on multiprocessing generation #769

Open lumimies opened 6 years ago

lumimies commented 6 years ago

Description

I created a multiprocess_generate supporting problem, using ChoppedTextProblem as a template. When I actually run the generation, it eventually froze. I used Pyrasite to get stack dumps from the running process. The stacks that weren't waiting in multiprocessing ended like this:

  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 66, in _launch
    self.pid = os.fork()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/c_api_util.py", line 50, in __del__
    c_api.TF_DeleteGraph(self.graph)

I found a tensorflow issue: tensorflow/tensorflow#8220 which isn't exactly relevant, but it says not to use tensorflow with multiprocessing. This happened close to the end of generation, so I believe it freezes when the first, or first few, processes try to shut down, when they all try to delete the same (default) TF graph, that t2t-datagen doesn't even use.

TensorFlow and tensor2tensor versions

tensorflow==1.7.0 tensor2tensor==1.5.7

rsepassi commented 6 years ago

The only subclass of ChoppedTextProblem is languagemodel_wiki_xml_v8k_l1k so I'm trying data generation for that problem to see if it also hangs/errors.