microsoft / BatteryML

MIT License
497 stars 107 forks source link

Outline pyproject.toml to install #11

Closed CompRhys closed 1 year ago

CompRhys commented 1 year ago

Added a simple pyproject.toml in order to be able to install and experiment with models. Renamed the src folder to package name to fit python best practice.

Addressed #9

fingertap commented 1 year ago

Can you please further check the existing functionalities still work? Also, how about putting the baseline.ipynb in a new folder, e.g.examples.

CompRhys commented 1 year ago

I had moved the notebook and an examples folder this but then it couldn't find the preprocessing scripts due to location based referencing. Are you opposed to the preprocessing being included in the package?

fingertap commented 1 year ago

The datasets are very important to the project. However, we do not host the datasets, so it is important that we refer the users to the original data source to acquire the data. That is a reason we do not organize the repo as a Python package. However, I am personally not against this idea. Considering that we are actively working on new features for BatteryML, including for example, support for more learning tasks, more input formats, more learning models, visualization, etc., it will become a strong python package for battery learning tasks. However, we are not sure if we should decouple the tools and the data sources and organize into two different projects.

jhrrlee commented 1 year ago

This is just my suggestion for a generalized framework. Defining a standardized data entity, including parameters like cell id, time, v, I, t, and internal resistance (these may be basic parameters. Application data like EV has more parameters) could be a critical task for supporting diverse datasets in a generalized framework. I expect that researchers will adopt this standard if the framework's upper layers are beneficial. However, initially, dedicated efforts in data transformation are required to show outputs and comparisons across various datasets.

fingertap commented 1 year ago

Hi @jhrrlee , thanks for your suggestion! If I understood correctly, you were looking for src.data.BatteryData. Currently we do not include the IR information in the cycling data, as the resistance is not always available. However, customized features are always possible by setting them as the attributes of BatteryData and you can save them by BatteryData.save. These features will then be available when you load the BatteryData from disk.

CompRhys commented 1 year ago

Discussion of data class doesn't seem relevant to whether or not this should be a package? Perhaps this discussion should continue elsewhere such that discussion here is about the PR?

In terms of packaging I think it's common to package parsers for datasets to get them in desired formats in benchmark codebases. An obvious example would be the datasets wrapped by sklearn.

fingertap commented 1 year ago

Hi @CompRhys, good point. We can download and preprocess those datasets with CC By 4 license, and then organize the processed data in local cache or a specified location.

CompRhys commented 1 year ago

This all works I have now confirmed the baseline results in the notebook on MATR. I would say merge this and then look t dataset packaging in another issue/PR combo

fingertap commented 1 year ago

Thanks for contribution! I will check on this later on next week.