lzamparo / embedding

Learning semantic embeddings for TF binding preferences directly from sequence
Other
0 stars 0 forks source link

Strange BrokenPipe / EOFError when using generate_dataset_parallel #1

Closed lzamparo closed 7 years ago

lzamparo commented 7 years ago

cf traceback here

Not sure, but I suspect that when training in parallel, a race condition occurs in generate_dataset_parallel when generating the macrobatches. I can train a model with generate_dataset_serial which trains alright (eventually get a loss of NaN, but that's an unrelated problem).

Source might be problem here: https://github.com/lanjelot/patator/issues/18#issuecomment-135204757

Fix might be here: https://gist.github.com/mangecoeur/9540178

lzamparo commented 7 years ago

Another example of using threads with concurrent.futures: https://gist.github.com/angad/9800379 And here is the module profiled in pymotw: https://pymotw.com/3/concurrent.futures/

lzamparo commented 7 years ago

In my own simple test, a threading model provides ~1x speedup versus the serialized model.

lzamparo commented 7 years ago

Related to #4 , #5 queue imbalance. Closing.