nteract / papermill

📚 Parameterize, execute, and analyze notebooks
http://papermill.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
5.97k stars 429 forks source link

Passing list of dictionaries as injected parameter from yaml #493

Open PyMap opened 4 years ago

PyMap commented 4 years ago

Hi all!,

Just want to share with the team some details I've been experiencing while I executed notebooks from command line using a yaml file.

First, let me show my case. I've bee parametrizing different notebooks to isolate data wranlging processes. To do it, I needed to use lists of dictionaries to specifiy keys describing my data, such as area or paths where some files were stored. As all this information could change depending on user's specs I implemented the following structure from a yaml file:

- area: area name
  indicator: 
       - indicator name
  reference: brief description
- area: area name
  indicator: 
        - indicator name 1
        - indicator name 2
  reference: brief description
(...)

To execute this, I was using the -f yaml_file.yml instruction from terminal. But when it runned got this:

ValueError: dictionary update sequence element #0 has length 3; 2 is required

So, I putted the information from that yaml inside a kind of master key and it worked. Then I realized that that key was the name of the variable where the yaml information was instantiated inside the notebook.

I share this because, it took me a while to be aware of this (I've validated my yaml structure on web pages and it was ok). I'm correctly using this feature? It is ok to use this kind of master key for this cases?

Thanks for your work with this library!

MSeal commented 4 years ago

Yes, so because we have to make the inputs to variables in your notebook everything has to have a key that's the variable name to use. For values they can have any valid shape with the one restriction that keys of nested dicts have to have to be strings for parsing and cross-language reasons. In your example above you can just wrap the list with a variable name and it should work cleanly.

areas:
- area: area name
  indicator: 
       - indicator name
  reference: brief description
- area: area name
  indicator: 
        - indicator name 1
        - indicator name 2
  reference: brief description
(...)

Glad you find the tool useful :)

PyMap commented 4 years ago

Right! That's what I did. But since there is no documentation for this kind of cases (or at least I didn't found it) I think it could be helpul to add it somewhere. Thanks again for the feedback and also your work on this library.

MSeal commented 4 years ago

Open to PRs for improving the descriptions in the /docs directory as well :)