mitodl / edx2bigquery

Tool to convert & load data from edX platform into BigQuery
GNU General Public License v2.0
29 stars 29 forks source link

Use tempfile along with os.getpid() to generate temp file names #78

Closed AbdouSeck closed 4 years ago

AbdouSeck commented 4 years ago

This pull fixes a bug in rephrase_forum_data.py that comes up when multiple python processes or threads try to build a gzip file with the same name (tmp.json.gz).

Current solution: Use a combination of tempfile.mkstemp and os.getpid() to safely generate a uniquely random file name.

Also, since tempfile.mkstemp returns a combination of a file descriptor and a file name, the former ends up being used with os.fdopen when creating the gzip.GzipFile object needed to write the rephrased forum data to disk. Using os.fdopen is both cleaner and more efficient, since the OS is already aware of the existence of the file descriptor.

Please let me know if you have any questions.