Repository structure (file layouts, cookiecutter templating engines, distribution options)

agriyakhetarpal commented 1 year ago

Starting this as a placeholder issue for tracking down tasks to be completed and those that are complete. I will be dividing these into separate issues and PRs

Cookiecutters

as suggested by @Saransh-cpp

Examples

Suggestions from @brosaplanella:

"A bit different, but Julia has DrWatson.jl which has many cool features, maybe we can get some ideas".
A figures folder (with a .gitkeep)
A data folder (later we could have some pipelines to process it, like the Data Science example above).

Possible layout

The folder structure can look like this

├── .github/workflows
├── src
├── data
└── docs
      ├── examples  # notebooks that can be rendered with nbsphinx
      ├── _static
      ├── sphinxext
      ├── source
      └── conf.py
├── tests  # (optional)
├── parameters
    └── my_parameters.py  # contains the get_parameter_values() function
├── examples/  # alternatively, example scripts or notebooks that are not to be rendered with the Sphinx builder
├── pyproject.toml
├── README.md
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── .env # contains DATA_PATH
├── noxfile.py  # (or tox.ini, if users want to use tox)

The required documentation should

Explain how to clone this template to start a new project
Explain how to rename the project in pyproject.toml and docs/conf.py
Explain how to structure a typical project with source files and utility classes and methods (in src/), unit tests with

Parameter entry points in pyproject.toml, with

[project.entry-points.pybamm_parameter_sets]
MyParameters = "package_name.parameters.my_parameters:get_parameter_values"

Point to the PyBaMM documentation wherein the developments and advancements in this repository shall be reflected on a separate page or section in the user guide

which can then be accessed as pybamm.ParameterValues(“MyParameters”) in the source code.

Tracked in https://github.com/pybamm-team/pybamm-cookiecutter/pull/6.

Configuration options

Build-backends

hatch
flit
poetry
setuptools (later)

Documentation

Theme
Sphinx extensions

Project structure

Project metadata in pyproject.toml
and so on

Available licenses (#2)

MIT
Apache-2.0
BSD-3-Clause
and so on (should be permissive licenses suitable for collaborative research practices and open science)

Addendum 27/02/2024: another thing we would want would be entry points for models in the PyBaMM model structure rather than just parameter sets, please see https://github.com/pybamm-team/PyBaMM/issues/3839#issuecomment-1966614301

agriyakhetarpal commented 1 year ago

So I am not sure how this will go yet, though I learned that cruft retains full compatibility with templates based on cookiecutter, but copier has some differences (it uses a YAML file instead of JSON for the project specification).

However, scientific-python/cookie supports all three of them, so I think our use case as a stripped-down, barebones version of it can also support all three—unless we don't need to support all three and just cookiecutter and cruft will be enough

Saransh-cpp commented 1 year ago

Thanks for summarising this! Don't worry about supporting a lot of things at the start. We can start with a simple structure, one backend (hatch) support, and just cookiecutter support.

valentinsulzer commented 1 year ago

Rather than having a data folder, we should encourage separation of code and data with data path specified via the .env file. People are welcome to keep their code and data in the same place, but the data should ideally not be updated with the code to github, except for some examples.

If the data path is set as DATA_PATH="path/to/data" in .env, then the following code will load it

from dotenv import load_dotenv
load_dotenv()

path_to_data = os.environ["DATA_PATH"]

We could add that as one of the default utility functions in the src folder, e.g. in util.py

from dotenv import load_dotenv
load_dotenv()

def environ():
    return os.environ

agriyakhetarpal commented 1 year ago

What sorts of data would DATA_PATH contain ideally? We can further streamline the process of using it with some extra utility functions based on that too

agriyakhetarpal commented 1 year ago

Also, I renamed the project from pybamm-cookie-cutter to pybamm-cookiecutter because I saw that most templates with the cookiecutter topic on GitHub were named as such, i.e., without the space between "cookie" and "cutter"

valentinsulzer commented 1 year ago

What sorts of data would DATA_PATH contain ideally? We can further streamline the process of using it with some extra utility functions based on that too

Probably csv or parquet

agriyakhetarpal commented 1 year ago

I think csv and parquet files would be nice, we would have to use pandas as a dependency in that case (or just get it from the optional dependencies after https://github.com/pybamm-team/PyBaMM/pull/3144 is merged)

A utility function for them could be something like

from pybamm_cookiecutter.util import DataLoader
import pybamm

battery_data = DataLoader.load_data("file1.csv")

In other words, as a wrapper over a combination of load_dotenv and pandas.read_csv() with some customisation here and there

agriyakhetarpal commented 1 year ago

See also: https://learn.scientific-python.org/development/patterns/data-files/. We could adopt pooch within PyBaMM too, especially for the SuiteSparse and SUNDIALS downloadables in scripts/install_KLU_Sundials.py

valentinsulzer commented 6 months ago

Adding something else to this roadmap, it would be nice if we could add new models via entry points as well. This requires a few changes in PyBaMM though

agriyakhetarpal commented 6 months ago

Adding a model via an entry point sounds like a nice idea, but it could be too excessive as well if it isn't done correctly. Do you have a proof-of-concept – I'm not entirely sure how it would go?

valentinsulzer commented 6 months ago

The author would create a model class similar to pybamm's BasicDFN where it's entirely self-contained, then anyone else would be able to call the model with something like pybamm.lithium_ion.Model("author/model-name").

This would solve several existing pain points with adding new models:

PyBaMM’s submodel structure is too complex, huge barrier to entry
You have to add things to the PyBaMM repo, uncertain ownership and IP
PyBaMM team has to "endorse" every model or gatekeep

With entry points, adding a new model is separate from PyBaMM and authors get to retain ownership but we don't have to endorse the models

agriyakhetarpal commented 6 months ago

Ah, sounds great – a bootstrapped model should be possible to implement, and IIUC should work similar to how we do parameter sets; though I would like to note that parameter sets are returned as Python dictionaries so it's easier to handle them, here we might have to establish a class that can either parse the AST for a model (or rather just import a JSON-serialised model) to pass it to pybamm.lithium_ion.Model("author/model-name"). This might be better to do in the PyBaMM source code, as you mentioned.

This issue has been referenced in the GSoC 2024 ideas page for potential readers and contributors, so if and when we flesh out these ideas a bit more, I suggest we should edit and add everything to the top of the thread as well.

santacodes commented 3 months ago

Rather than having a data folder, we should encourage separation of code and data with data path specified via the .env file. People are welcome to keep their code and data in the same place, but the data should ideally not be updated with the code to github, except for some examples.

I guess now with the pooch PR merged we could add the default pooch data files path for storing data here as well, which is under .cache for POSIX, and under %appdata% for windows machines. That way, we could use pybamm.DataLoader to load data files inside PyBaMM based projects.

agriyakhetarpal commented 3 weeks ago

Adding support for a data/ folder in the generated project, better guidelines on setting up documentation via MyST-NB (i.e., better ways presentation for results/code from research papers), etc., all sound like good feature sets for a v1 release someday, given that we are releasing v0.1 this week.

pybamm-team / pybamm-cookie