Closed CompRhys closed 1 year ago
Can you please further check the existing functionalities still work? Also, how about putting the baseline.ipynb
in a new folder, e.g.examples
.
I had moved the notebook and an examples folder this but then it couldn't find the preprocessing scripts due to location based referencing. Are you opposed to the preprocessing being included in the package?
The datasets are very important to the project. However, we do not host the datasets, so it is important that we refer the users to the original data source to acquire the data. That is a reason we do not organize the repo as a Python package. However, I am personally not against this idea. Considering that we are actively working on new features for BatteryML, including for example, support for more learning tasks, more input formats, more learning models, visualization, etc., it will become a strong python package for battery learning tasks. However, we are not sure if we should decouple the tools and the data sources and organize into two different projects.
This is just my suggestion for a generalized framework. Defining a standardized data entity, including parameters like cell id, time, v, I, t, and internal resistance (these may be basic parameters. Application data like EV has more parameters) could be a critical task for supporting diverse datasets in a generalized framework. I expect that researchers will adopt this standard if the framework's upper layers are beneficial. However, initially, dedicated efforts in data transformation are required to show outputs and comparisons across various datasets.
Hi @jhrrlee , thanks for your suggestion! If I understood correctly, you were looking for src.data.BatteryData
. Currently we do not include the IR information in the cycling data, as the resistance is not always available. However, customized features are always possible by setting them as the attributes of BatteryData
and you can save them by BatteryData.save
. These features will then be available when you load the BatteryData
from disk.
Discussion of data class doesn't seem relevant to whether or not this should be a package? Perhaps this discussion should continue elsewhere such that discussion here is about the PR?
In terms of packaging I think it's common to package parsers for datasets to get them in desired formats in benchmark codebases. An obvious example would be the datasets wrapped by sklearn.
Hi @CompRhys, good point. We can download and preprocess those datasets with CC By 4 license, and then organize the processed data in local cache or a specified location.
This all works I have now confirmed the baseline results in the notebook on MATR. I would say merge this and then look t dataset packaging in another issue/PR combo
Thanks for contribution! I will check on this later on next week.
Added a simple pyproject.toml in order to be able to install and experiment with models. Renamed the src folder to package name to fit python best practice.
Addressed #9