thombashi / pytablewriter

pytablewriter is a Python library to write a table in various formats: AsciiDoc / CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
https://pytablewriter.rtfd.io/
MIT License
611 stars 43 forks source link

writer.dumps() fails due to multiprocessing if __main__ is not protected #27

Closed skjerns closed 4 years ago

skjerns commented 4 years ago

I'm using a script to call a table-writer method in another module.

However, due to multiprocessing usage, this fails, as the new processes import the unprotected script.

Can be reproduces like this:

from pytablewriter import HtmlTableWriter
# no __name__=='__main__' statement
writer = HtmlTableWriter()
writer.value_matrix  = [[1,2,3,4]]    
string = writer.dumps()

I'm a bit surprised that this package needs multiprocessing, are there such heavy computations to be made?

Might it be an idea to switch to joblib.Parallel, as in my experience it doesn't suffer from this problem? Or to disable multiprocessing/switch to multithreading (no reimport of main module necessary)?

(It's a Windows problem, as there is no fork on winx)

thombashi commented 4 years ago

@skjerns Thank you for your feedback.

You can switch to a single process by setting max_workers attribute of the writer to one:

writer.max_workers = 1

are there such heavy computations to be made?

Yes, the larger the data the more converting processing time will required. However, multiprocessing is not always necessary especially for small data like your example. It might be more desirable to max_workers to one as the default. I will consider for the future release.