greenelab / cimr

tool for continuously integrated meta-resource
https://cimr.readthedocs.io/
Other
1 stars 2 forks source link

macOS in parallel mode: log messages in child processes swallowed #67

Open dongbohu opened 4 years ago

dongbohu commented 4 years ago

On a macOS test box (Mojave, 10.14.6, running Python 3.8.1), when parallel data processing is enabled by -parallel n option (in which n > 0), most of the log messages generated by child processes disappeared from the main console.

For example, when running the test case in examples/gwas/gwas.yml, here is the output generated by cimr processor -process -yaml-file gwas.yml -chunksize 500 -parallel 4:

[2020-02-24 17:00:03 root:INFO    ]  processing .../cimr/examples/gwas/gwas.yml
[2020-02-24 17:00:03 root:INFO    ]  file exists in submitted_data/gwas/
[2020-02-24 17:00:03 root:INFO    ]  md5sum verified.
[2020-02-24 17:00:03 root:INFO    ]  following columns will be renamed: 
[2020-02-24 17:00:03 root:INFO    ]  dict_keys(['panel_variant_id', 'chromosome', 'position', 'variant_id', 'effect_allele', 'non_effect_allele', 'effect_size', 'standard_error', 'zscore', 'pvalue', 'imputation_status'])
[2020-02-24 17:00:03 root:INFO    ]  making dir: processed_data/gwas/
[2020-02-24 17:00:03 root:INFO    ]  found submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:00:03 root:INFO    ]  loading submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:00:03 root:INFO    ]  chunksize: 0.0005 million
[2020-02-24 17:00:03 root:INFO    ]  number of parallel processes: 4
WARNING:root: chunk #2: chromosome id needs to be checked.
WARNING:root: chunk #1: chromosome id needs to be checked.
WARNING:root: chunk #3: chromosome id needs to be checked.
WARNING:root: chunk #4: chromosome id needs to be checked.
WARNING:root: chunk #5: chromosome id needs to be checked.
[2020-02-24 17:00:04 root:INFO    ]  combine chunk output files ...
[2020-02-24 17:00:04 root:INFO    ]  finished processing submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:00:04 root:INFO    ]  output has been saved as processed_data/gwas/gwas.txt.gz.
[2020-02-24 17:00:04 root:INFO    ]  ------------------------------------------------------------

[2020-02-24 17:00:04 root:INFO    ]  reading a local catalog file catalog.txt.
[2020-02-24 17:00:04 root:INFO    ]  context_id is not provided.
[2020-02-24 17:00:04 root:INFO    ]  updating catalog.txt for submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:00:04 root:INFO    ]  ------------------------------------------------------------

Only five lines of log messages from child processes were shown above:

WARNING:root: chunk #2: chromosome id needs to be checked.
WARNING:root: chunk #1: chromosome id needs to be checked.
WARNING:root: chunk #3: chromosome id needs to be checked.
WARNING:root: chunk #4: chromosome id needs to be checked.
WARNING:root: chunk #5: chromosome id needs to be checked.

But if we change the command to non-parallel mode: cimr processor -process -yaml-file gwas.yml -chunksize 500 -parallel 0 then the output log messages became:

[2020-02-24 17:02:40 root:INFO    ]  processing .../cimr/examples/gwas/gwas.yml
[2020-02-24 17:02:40 root:INFO    ]  file exists in submitted_data/gwas/
[2020-02-24 17:02:40 root:INFO    ]  md5sum verified.
[2020-02-24 17:02:40 root:INFO    ]  following columns will be renamed: 
[2020-02-24 17:02:40 root:INFO    ]  dict_keys(['panel_variant_id', 'chromosome', 'position', 'variant_id', 'effect_allele', 'non_effect_allele', 'effect_size', 'standard_error', 'zscore', 'pvalue', 'imputation_status'])
[2020-02-24 17:02:40 root:INFO    ]  making dir: processed_data/gwas/
[2020-02-24 17:02:40 root:INFO    ]  found submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:02:40 root:INFO    ]  loading submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:02:40 root:INFO    ]  chunksize: 0.0005 million
[2020-02-24 17:02:40 root:INFO    ]  number of parallel processes: 0
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: start processing ...
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: renaming columns based on column dictionary.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: chromosome(s) included: ['chr1' 'chr10' 'chr11' 'chr12' 'chr13']
[2020-02-24 17:02:40 root:WARNING ]  chunk #1: chromosome id needs to be checked.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: finished checking chromosome ids.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: variant_id has been standardized.
[2020-02-24 17:02:40 root:INFO    ]  variant counts before filtering palindromic variants: 500
[2020-02-24 17:02:40 root:INFO    ]  variant counts after filtering palindromic variants: 426
[2020-02-24 17:02:40 root:INFO    ]  data contains z-score column.
[2020-02-24 17:02:40 root:INFO    ]  pvalue is numeric.
[2020-02-24 17:02:40 root:INFO    ]  zscore is numeric.
[2020-02-24 17:02:40 root:INFO    ]  effect_size is numeric.
[2020-02-24 17:02:40 root:INFO    ]  standard_error is numeric.
[2020-02-24 17:02:40 root:INFO    ]  pvalue only contains values between 0 and 1.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: dropping duplicate columns.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: reordering processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: writing processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: ---------------------- DONE ----------------------
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: start processing ...
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: renaming columns based on column dictionary.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: chromosome(s) included: ['chr14' 'chr15' 'chr16' 'chr17' 'chr18']
[2020-02-24 17:02:40 root:WARNING ]  chunk #2: chromosome id needs to be checked.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: finished checking chromosome ids.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: variant_id has been standardized.
[2020-02-24 17:02:40 root:INFO    ]  variant counts before filtering palindromic variants: 500
[2020-02-24 17:02:40 root:INFO    ]  variant counts after filtering palindromic variants: 434
[2020-02-24 17:02:40 root:INFO    ]  data contains z-score column.
[2020-02-24 17:02:40 root:INFO    ]  pvalue is numeric.
[2020-02-24 17:02:40 root:INFO    ]  zscore is numeric.
[2020-02-24 17:02:40 root:INFO    ]  effect_size is numeric.
[2020-02-24 17:02:40 root:INFO    ]  standard_error is numeric.
[2020-02-24 17:02:40 root:INFO    ]  pvalue only contains values between 0 and 1.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: dropping duplicate columns.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: reordering processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: writing processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: ---------------------- DONE ----------------------
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: start processing ...
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: renaming columns based on column dictionary.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: chromosome(s) included: ['chr19' 'chr2' 'chr20' 'chr21' 'chr22']
[2020-02-24 17:02:40 root:WARNING ]  chunk #3: chromosome id needs to be checked.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: finished checking chromosome ids.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: variant_id has been standardized.
[2020-02-24 17:02:40 root:INFO    ]  variant counts before filtering palindromic variants: 500
[2020-02-24 17:02:40 root:INFO    ]  variant counts after filtering palindromic variants: 441
[2020-02-24 17:02:40 root:INFO    ]  data contains z-score column.
[2020-02-24 17:02:40 root:INFO    ]  pvalue is numeric.
[2020-02-24 17:02:40 root:INFO    ]  zscore is numeric.
[2020-02-24 17:02:40 root:INFO    ]  effect_size is numeric.
[2020-02-24 17:02:40 root:INFO    ]  standard_error is numeric.
[2020-02-24 17:02:40 root:INFO    ]  pvalue only contains values between 0 and 1.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: dropping duplicate columns.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: reordering processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: writing processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: ---------------------- DONE ----------------------
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: start processing ...
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: renaming columns based on column dictionary.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: chromosome(s) included: ['chr3' 'chr4' 'chr5' 'chr6' 'chr7']
[2020-02-24 17:02:40 root:WARNING ]  chunk #4: chromosome id needs to be checked.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: finished checking chromosome ids.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: variant_id has been standardized.
[2020-02-24 17:02:40 root:INFO    ]  variant counts before filtering palindromic variants: 500
[2020-02-24 17:02:40 root:INFO    ]  variant counts after filtering palindromic variants: 428
[2020-02-24 17:02:40 root:INFO    ]  data contains z-score column.
[2020-02-24 17:02:40 root:INFO    ]  pvalue is numeric.
[2020-02-24 17:02:40 root:INFO    ]  zscore is numeric.
[2020-02-24 17:02:40 root:INFO    ]  effect_size is numeric.
[2020-02-24 17:02:40 root:INFO    ]  standard_error is numeric.
[2020-02-24 17:02:40 root:INFO    ]  pvalue only contains values between 0 and 1.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: dropping duplicate columns.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: reordering processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: writing processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: ---------------------- DONE ----------------------
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: start processing ...
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: renaming columns based on column dictionary.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: chromosome(s) included: ['chr8' 'chr9']
[2020-02-24 17:02:40 root:WARNING ]  chunk #5: chromosome id needs to be checked.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: finished checking chromosome ids.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: variant_id has been standardized.
[2020-02-24 17:02:40 root:INFO    ]  variant counts before filtering palindromic variants: 200
[2020-02-24 17:02:40 root:INFO    ]  variant counts after filtering palindromic variants: 167
[2020-02-24 17:02:40 root:INFO    ]  data contains z-score column.
[2020-02-24 17:02:40 root:INFO    ]  pvalue is numeric.
[2020-02-24 17:02:40 root:INFO    ]  zscore is numeric.
[2020-02-24 17:02:40 root:INFO    ]  effect_size is numeric.
[2020-02-24 17:02:40 root:INFO    ]  standard_error is numeric.
[2020-02-24 17:02:40 root:INFO    ]  pvalue only contains values between 0 and 1.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: dropping duplicate columns.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: reordering processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: writing processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: ---------------------- DONE ----------------------
[2020-02-24 17:02:40 root:INFO    ]  finished processing submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:02:40 root:INFO    ]  output has been saved as processed_data/gwas/gwas.txt.gz.
[2020-02-24 17:02:40 root:INFO    ]  ------------------------------------------------------------

[2020-02-24 17:02:40 root:INFO    ]  reading a local catalog file catalog.txt.
[2020-02-24 17:02:40 root:INFO    ]  context_id is not provided.
[2020-02-24 17:02:40 root:INFO    ]  updating catalog.txt for submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:02:40 root:INFO    ]  ------------------------------------------------------------

All lines from:

[2020-02-24 17:02:40 root:INFO    ]  chunk #1: start processing ...

until:

[2020-02-24 17:02:40 root:INFO    ]  finished processing submitted_data/gwas/gwas.txt.gz

were generated by child processes.

This issue doesn't happen on Linux boxes (Ubuntu and Debian, Python 3.6.x, 3,7,x and 3.8.1).

dongbohu commented 4 years ago

This article discusses logging in multiprocessing module: https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes It seems to make sense that logging in child processes is independent from logging of parent process. But why does this only happen on macOS?

dongbohu commented 4 years ago

This issue is also confirmed on macOS 10.13.6 (High Sierra) running Python 3.8.1.