nteract / papermill

📚 Parameterize, execute, and analyze notebooks
http://papermill.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
5.98k stars 429 forks source link

Tuple in the config dictionary changed to string when run a python notebook with papermill #784

Open gmhhope opened 8 months ago

gmhhope commented 8 months ago

🐛 Tuple in the config dictionary changed to string when run a python notebook with papermill

I used the current config to run papermill with the following script

The config dictionary

intersect_prep_config = {'path2central_output': './data-adj-donor-ignore-injOrder/',
 'exp_tag': 'compare',
 'pattern': '^lm_[HILIC|RPneg].+\\.csv',
 'modes': ['HILICpos_pel',
  'HILICneg_pel',
  'RPneg_pel',
  'RPpos_pel',
  'HILICpos_sup',
  'HILICneg_sup',
  'RPneg_sup',
  'RPpos_sup'],
 'tag2criteria': {'Treat-dGMIL4': {'cond': {'pval_treatment': (0.05, '<')},
   'expFdr': 'lm-dG4'},
  'GW*T': {'cond': {'pval_GW_orNot.T_orNot': (0.05, '<')},
   'expFdr': 'lm-dUnt-interaction'},
  'Treat-dUnt': {'cond': {'pval_treatment': (0.05, '<')},
   'expFdr': 'lm-deltaUnt'},
  'Treat-dGM': {'cond': {'pval_treatment': (0.05, '<')},
   'expFdr': 'lm-deltaGM'},
  'Treat-3grps': {'cond': {'pval_treatment': (0.05, '<')},
   'expFdr': 'lm-treatment-3grps'}},
 'current_working_directory': '/Users/gongm/Documents/projects/Fernando-human-moDC/GB-Fernando-human-moDC/script/down_anal/8-reanal-wo-inj-ord-correction',
 'input_notebook_path': './5.0-pval-intersect-prep.ipynb',
 'output_notebook_path': './archive-adj-donor-ignore-injOrder/5-pval-intersect-prep-2024-03-06_18_47.ipynb'}

THe script that was used to run papermill

if rerun_intersect: try: pm.execute_notebook( input_path=intersect_prep_config['input_notebook_path'], output_path=intersect_prep_config['output_notebook_path'], parameters=intersect_prep_config, kernel_name=py_kernel, cwd=os.getcwd() ) except: print(f"{exp} was not done!")


But when I looked at the output notebook, the tuple values become string as seen below:

# Parameters
```python
path2central_output = "./data-adj-donor-ignore-injOrder/"
exp_tag = "compare"
pattern = "^lm_[HILIC|RPneg].+\\.csv"
modes = [
    "HILICpos_pel",
    "HILICneg_pel",
    "RPneg_pel",
    "RPpos_pel",
    "HILICpos_sup",
    "HILICneg_sup",
    "RPneg_sup",
    "RPpos_sup",
]
tag2criteria = {
    "Treat-dGMIL4": {"cond": {"pval_treatment": "(0.05, '<')"}, "expFdr": "lm-dG4"},
    "GW*T": {
        "cond": {"pval_GW_orNot.T_orNot": "(0.05, '<')"},
        "expFdr": "lm-dUnt-interaction",
    },
    "Treat-dUnt": {"cond": {"pval_treatment": "(0.05, '<')"}, "expFdr": "lm-deltaUnt"},
    "Treat-dGM": {"cond": {"pval_treatment": "(0.05, '<')"}, "expFdr": "lm-deltaGM"},
    "Treat-3grps": {
        "cond": {"pval_treatment": "(0.05, '<')"},
        "expFdr": "lm-treatment-3grps",
    },
}

I have already noticed that if I run the R_kernel notebook, the list won't be able to convert to vector naturally. But that is adhering to data structure.

But I don't understand why tuple could become string after delivering the value to a python notebook?

Thanks for helping addressing this in advance. I will just use list instead in my application to fix the issue.

Best, Minghao Gong

MSeal commented 7 months ago

So tuples are partially specific to Python. If you change the input to a list instead of a tuple you should get a direct translation that's accurate. That conversion could be automated via adding a check for tuples here https://github.com/nteract/papermill/blob/main/papermill/translators.py#L99, but it needs to match a JSON type so on the Notebook side you'd get a list instead of a tuple (which is functionally identical from a reader perspective).