Open agriyakhetarpal opened 1 year ago
So I am not sure how this will go yet, though I learned that cruft
retains full compatibility with templates based on cookiecutter
, but copier
has some differences (it uses a YAML file instead of JSON for the project specification).
However, scientific-python/cookie
supports all three of them, so I think our use case as a stripped-down, barebones version of it can also support all three—unless we don't need to support all three and just cookiecutter
and cruft
will be enough
Thanks for summarising this! Don't worry about supporting a lot of things at the start. We can start with a simple structure, one backend (hatch
) support, and just cookiecutter
support.
Rather than having a data
folder, we should encourage separation of code and data with data path specified via the .env
file. People are welcome to keep their code and data in the same place, but the data should ideally not be updated with the code to github, except for some examples.
If the data path is set as DATA_PATH="path/to/data"
in .env
, then the following code will load it
from dotenv import load_dotenv
load_dotenv()
path_to_data = os.environ["DATA_PATH"]
We could add that as one of the default utility functions in the src
folder, e.g. in util.py
from dotenv import load_dotenv
load_dotenv()
def environ():
return os.environ
What sorts of data would DATA_PATH
contain ideally? We can further streamline the process of using it with some extra utility functions based on that too
Also, I renamed the project from pybamm-cookie-cutter
to pybamm-cookiecutter
because I saw that most templates with the cookiecutter topic on GitHub were named as such, i.e., without the space between "cookie" and "cutter"
What sorts of data would DATA_PATH contain ideally? We can further streamline the process of using it with some extra utility functions based on that too
Probably csv or parquet
I think csv and parquet files would be nice, we would have to use pandas
as a dependency in that case (or just get it from the optional dependencies after https://github.com/pybamm-team/PyBaMM/pull/3144 is merged)
A utility function for them could be something like
from pybamm_cookiecutter.util import DataLoader
import pybamm
battery_data = DataLoader.load_data("file1.csv")
In other words, as a wrapper over a combination of load_dotenv
and pandas.read_csv()
with some customisation here and there
See also: https://learn.scientific-python.org/development/patterns/data-files/. We could adopt pooch
within PyBaMM too, especially for the SuiteSparse and SUNDIALS downloadables in scripts/install_KLU_Sundials.py
Adding something else to this roadmap, it would be nice if we could add new models via entry points as well. This requires a few changes in PyBaMM though
Adding a model via an entry point sounds like a nice idea, but it could be too excessive as well if it isn't done correctly. Do you have a proof-of-concept – I'm not entirely sure how it would go?
The author would create a model class similar to pybamm's BasicDFN
where it's entirely self-contained, then anyone else would be able to call the model with something like pybamm.lithium_ion.Model("author/model-name")
.
This would solve several existing pain points with adding new models:
With entry points, adding a new model is separate from PyBaMM and authors get to retain ownership but we don't have to endorse the models
Ah, sounds great – a bootstrapped model should be possible to implement, and IIUC should work similar to how we do parameter sets; though I would like to note that parameter sets are returned as Python dictionaries so it's easier to handle them, here we might have to establish a class that can either parse the AST for a model (or rather just import a JSON-serialised model) to pass it to pybamm.lithium_ion.Model("author/model-name")
. This might be better to do in the PyBaMM source code, as you mentioned.
This issue has been referenced in the GSoC 2024 ideas page for potential readers and contributors, so if and when we flesh out these ideas a bit more, I suggest we should edit and add everything to the top of the thread as well.
Rather than having a
data
folder, we should encourage separation of code and data with data path specified via the.env
file. People are welcome to keep their code and data in the same place, but the data should ideally not be updated with the code to github, except for some examples.
I guess now with the pooch PR merged we could add the default pooch data files path for storing data here as well, which is under .cache
for POSIX, and under %appdata%
for windows machines. That way, we could use pybamm.DataLoader to load data files inside PyBaMM based projects.
Adding support for a data/
folder in the generated project, better guidelines on setting up documentation via MyST-NB (i.e., better ways presentation for results/code from research papers), etc., all sound like good feature sets for a v1 release someday, given that we are releasing v0.1 this week.
Starting this as a placeholder issue for tracking down tasks to be completed and those that are complete. I will be dividing these into separate issues and PRs
Cookiecutters
as suggested by @Saransh-cpp
Examples
Suggestions from @brosaplanella:
.gitkeep
)Possible layout
The folder structure can look like this
The required documentation should
pyproject.toml
anddocs/conf.py
src/
), unit tests withpyproject.toml
, withwhich can then be accessed as
pybamm.ParameterValues(“MyParameters”)
in the source code.Tracked in https://github.com/pybamm-team/pybamm-cookiecutter/pull/6.
Configuration options
Build-backends
hatch
flit
setuptools
(later)Documentation
Project structure
pyproject.toml
Available licenses (#2)
Addendum 27/02/2024: another thing we would want would be entry points for models in the PyBaMM model structure rather than just parameter sets, please see https://github.com/pybamm-team/PyBaMM/issues/3839#issuecomment-1966614301