This pull fixes a bug in rephrase_forum_data.py that comes up when multiple python processes or threads try to build a gzip file with the same name (tmp.json.gz).
Current solution:
Use a combination of tempfile.mkstemp and os.getpid() to safely generate a uniquely random file name.
Also, since tempfile.mkstemp returns a combination of a file descriptor and a file name, the former ends up being used with os.fdopen when creating the gzip.GzipFile object needed to write the rephrased forum data to disk. Using os.fdopen is both cleaner and more efficient, since the OS is already aware of the existence of the file descriptor.
This pull fixes a bug in rephrase_forum_data.py that comes up when multiple python processes or threads try to build a gzip file with the same name (tmp.json.gz).
Current solution: Use a combination of
tempfile.mkstemp
andos.getpid()
to safely generate a uniquely random file name.Also, since
tempfile.mkstemp
returns a combination of a file descriptor and a file name, the former ends up being used withos.fdopen
when creating thegzip.GzipFile
object needed to write the rephrased forum data to disk. Usingos.fdopen
is both cleaner and more efficient, since the OS is already aware of the existence of the file descriptor.Please let me know if you have any questions.