mikapfl / read_di_unfccc

Dataset containing all data available from the UNFCCC API available at https://di.unfccc.int, and scripts to generate it
Apache License 2.0
5 stars 1 forks source link

Release version / stable API for integration as pyam dependency #3

Closed danielhuppmann closed 3 years ago

danielhuppmann commented 3 years ago

Thanks for this nice package! I would be interested to include this as a dependency for pyam to make it easier for users to compare scenarios (like those used in the IPCC reports) with detailed historical emissions data. There already is a similar feature to read data from the World Bank, see here (implemented as a lightweight API to pandas_datareader).

Question is whether it makes sense to do this soon, or wait for any pending API changes to the UNFCCCApiReader class (seeing that there is no stable release version yet)? Also, are you planning to turn this into an installable package and/or release this via pypi?

mikapfl commented 3 years ago

Hi Daniel!

So far, I thought of this project more in terms of accessible dataset releases that we would regularly update, so our first stable release is https://zenodo.org/record/4199622 .The idea is that users just download the dataset and don't really care about the download scripts. In the future, I would hope to get really nice datalad integration so that users can then just "datalad pull" to always get the latest version of the data, but that needs some work to integrate datalad and zenodo.

As I understand you, you would like to instead incorporate the download scripts into pyam, so that users would download the datasets on demand as needed straight into pyam data structures, right? For that use case, it would of course be pretty stupid if pip installing installed 150 MB of data which is then never used, so the current structure geared towards dataset releases makes no sense for this use case.

Maybe these competing use cases can be fulfilled best with two separate repositories? So, I could put the download scripts into a python package with versioning etc., where you can use it as a dependency, and the dataset is another repository which then uses the python package as a dependency. That way the lightweight scripts are separated from the data. Would that make sense for you?

Just note that even with separating the download scripts into an own library package with a stable API, we can't reasonably promise any kind of stability for the functionality because the unofficial UNFCCC API could change or vanish at any time, so I would still advise users to cache downloaded results.

Cheers,

Mika

danielhuppmann commented 3 years ago

Right, a pip-install of the package shouldn't (automatically) download the data. The two repository-strategy seems like a good idea, in the spirit of "separation of concerns". An alternative would be to have an importable package that knows where the "dataladded" files are located, so that a pyam-utility-function could import (filtered versions of) these files. But I guess that would be a bit of work, whereas the API implementation is basically ready.

Let me know if I can assist!

mikapfl commented 3 years ago

Hi Daniel,

I have released the API as a stand-alone python package, see https://github.com/pik-primap/unfccc_di_api . It is pip-installable right away; I hope it suits your need.

I will properly convert the data release to use the python library package at the next release (whenever enough new data is available to justify another release).

Cheers,

Mika