Energy Production Modeling and Provenance

bt- commented 8 years ago

I think this belongs as its own issue for discussion, but it is closely related to the conversation around #88 and #17.

I am new to PVLIB, contributing to projects on Github, and the scientific python stack, but I do have experience working with other PV energy production modeling software (Helioscope, PVsyst, and SAM).

One issue that I've encountered when working with these programs is keeping track of revisions to the model, especially when working with a team. It seems like this issue could be largely resolved by using PVLIB and keeping the file defining the system under version control with git or maybe a more comprehensive provenance system like Sumatra.

It seems like a good time to express the desire to use PVLIB in this manner because the approach to defining an entire system from module through inverter and step-up XFMR to interconnection like in #88 is still in development.

I agree with @jforbess point in #88 that CSV input will be useful, but I would like to see a json or yaml convention develop that could be used for version controlled system configuration input files.

wholmgren commented 8 years ago

I'm only a 1/4 of the way through my coffee this morning, but it's not really clear to me what you're asking for here. For pvlib specifically, are you asking for something similar to #60?

I agree that using pvlib (or specific versions/commits of pvlib) and git can help make your modeling more reproducible. They don't solve everything, though. IPython notebooks are horrible for version control (see #94 for a recent example where I couldn't easily figure out what had changed via git diff). Most of this is well outside the scope of pvlib, though we can offer some advice based on our individual experiences.

dacoex commented 8 years ago

@bt- I agree that this is a good idea. Maybe even add Docker?

This a rather a documentation item.

bt- commented 8 years ago

I've been meaning to respond to clarify and am just getting to it. After reading my comment again, I agree it is not entirely clear.

One issue that I've encountered when working with these programs is keeping track of revisions to the model, especially when working with a team.

What I meant by 'model' here is a simulation of a PV system- env. data, pv system definition, and energy production results.

IPython notebooks are horrible for version control ...

I'm thinking that I would eventually just have straight code using PVlib to calculate energy production results and this would be controlled with git.

The heart of what I'm thinking about though is version control for the PV system configuration. I think that a yaml file that could be used to generate an object of the proposed PVsystem class would be very useful, as you suggested in PR #84. If I'm understanding you correctly, you referred to this system configuration information as 'meta' data in #17 today.

I am going to take a stab at starting this by looking at all the input options in the GUI of other pv modelling tools (SAM & PVsyst). I think @bmu described what needs to be done well in #17:

I don't know a standard for this, maybe because there is no standard. We could look at other software (e.g. PVSyst, SAM ...). There was the idea of PVML, a XML dialect dedicated to PV systems within the European project "Performance", but as far as I remember without published results.

Hopefully, I can find some time in the last week of November to work on this.

wholmgren commented 8 years ago

My understanding is that you want two things:

A csv/json/yaml standard for describing a PV system.
A script or function that takes one input csv/json/yaml file and turns it into a standard output file.

It would be easy for you to version control the input file, script, and pvlib. I think everyone would agree that this is desirable, but it will be quite complicated to make this flexible, powerful, and robust. I don't think that this requires a PVSystem object. You might start with cleaning up #88 and turning it into a script.

bt- commented 3 years ago

@wholmgren and @cwhanse, is this a topic that would be worth revisiting as part of the Solar Performance Insight (SPI) work?

I would really like to be able to just diff yaml files or look at the git commits on a yaml file rather than manually compare two pdfs documenting the assumptions/system parameters of PVsyst models.

It seems like this is pretty closely related to the Functional capability 5 of SPI to allow users to "upload pvsystem metadata". It seems like it might be a workable path to store the user uploaded data in a json or yaml format and then users like me could potentially bypass the web interface and interact with the json/yaml files directly if desired.

I think my preference would be to use a yaml format due to how easy they are to read and edit directly.

wholmgren commented 3 years ago

@bt- can you post a draft of the yaml you're thinking of? That would help ground the discussion and would help me understand the scope.

I could see methods like Location.from_json, Location.from_yaml, and similar for PVSystem and ModelChain. It's not clear to me how we could do much more than that within pvlib because these questions quickly become very application specific.

cwhanse commented 3 years ago

@bt- thanks for bringing this up again. Yes, for the SPI project we'll need to define a format to store and communicate parameters for models, and selection of models, to simulate a PV system. I see your point about the value of being about to exercise configuration control on the resulting files.

bt- commented 3 years ago

@wholmgren, here is a very rough example of the type of file I am envisioning.

I put this together just now in a few minutes without too much thought behind the organization or nesting of data, but I think it conveys the general idea. I haven not used yaml that much, so please forgive any obvious issues with the file itself.

Essentially, I'm thinking of a convention for defining a model in a yaml file that has equivalent inputs to PVsyst. It would be nice to have the flexibility to be more specific about internal models used, if desired.

mikofski commented 3 years ago

Hi Ben, that looks pretty similar to what we're considering for solarfarmer, which is loosely based on the schema we used for pvsim at SunPower. Should we collaborate?

```python { # start configuration "transformers": [ { "name": "", "transformer spec ID": "", # all of the specs are enumerated below "inverters": [ { "name": "", "inverter count": "", "inverter spec ID": "", "layouts": [ # number of layouts per inverter must be <= number of MPPTs { "name": "", "thermal parameters": ", Uc: }>", "row in front of PV system": "", # is this in the middle of the system or not "row in back of PV system": "", # useful for bifacial "number strings per row": "", # total size of layout must be a multiple of this "system azimuth": "", "module tilt": "", "gcr": "", "PV module spec ID": "", # only one module allowed per layout "PV modules per string": "", "mounting type ID": "", "number of strings": "", "DC collection loss": "", "etc": "<...>" }, { # next inverter 1 layout "name": "", "etc": "<...>" } ], # end inverter 1 layouts "AC collection loss": "", "etc": "<..>", }, { # next transformer 1 inverter "name": "", "etc": "<...>" } ], # end transformer 1 inverters "etc": "<...>" }, # end trasnformer 1 { # next transformer "name": "", "etc": "<...>" } ], # end transformers "PV module specs": { # enumerate the PV module specs by ID "": "", "etc": "<...>" }, "inverter specs": { # enumerate the inverter specs by ID "": "", "etc": "<...>" }, "transformer specs": { # enumerate the xfmr specs by ID "": "", "etc": "<...>" }, "mounting type specs": { # end of configuration "": "|", "etc": "<...>" }, "tmy weather dat file path": "", "soiling per month": "", "albedo per month": "", "latitude": "", "longitude": "", "elevation": "", "etc": "<...>" } # end of configuration ```

bt- commented 3 years ago

Hi Mark, yes! I think it makes sense to try to develop something that works across multiple tools. Is there a reason you are working with json rather than yaml? Stack Overflow is telling me that yaml is almost a superset of json. I prefer the readability of yaml for this application unless there is a good reason for json.

mikofski commented 3 years ago

I'm fine with yaml, updated gist:

```yaml PV module specs: : etc: <...> inverter specs: : etc: <...> mounting type specs: : | etc: <...> transformer specs: : etc: <...> albedo per month: elevation: latitude: longitude: soiling per month: tmy weather dat file path: etc: <...> transformers: - xfmr name: transformer spec ID: # xfmr specs are enumerated by ID at top level etc: <...> inverters: - inverter name: AC collection loss: inverter count: inverter spec ID: # inverter specs are enumerated by ID at top level etc: <..> layouts: - layout name: DC collection loss: PV module spec ID: # PV module specs are enumerated by ID at top level PV modules per string: gcr: module tilt: mounting type ID: # mounting system specs are enumerated by ID at top level number of strings: number strings per row: row in back of PV system: row in front of PV system: system azimuth: thermal parameters: ', Uv: }>' etc: <...> - layout name: # next layout etc: <...> - inverter name: # next inverter etc: <...> - xfmr name: # next xfmr etc: <...> ```

cedricleroy commented 3 years ago

Some food for thought: One big PROS for JSON is that it is the standard for web APIs, and would probably be easier (and faster) to parse / serialize across various libraries / languages. I find yaml easier to read, but harder to deal with when creating / editing big configuration file.

kandersolar commented 3 years ago

Just chiming in: JSON doesn't have comments, which IMHO is a big drawback in the context of config files.

wholmgren commented 3 years ago

I agree with all of the pros/cons of json/yaml above. I suggest that we move forward with the idea that we'll eventually support both and that they should have identical structure.

The examples help a lot. How do you envision pvlib using them? Location/PVSystem/ModelChain factories? What happens after you've loaded the data? ModelChain.run_model?

As an example, in the Solar Forecast Arbiter project we implemented a JSON specification for a model that's basically PVWatts with minor changes. We also implemented dataclasses that include to and from dict methods. Once we have the metadata/data we pass them through pvmodel.irradiance_to_power.

I expect the Solar Performance Insight json specification to look somewhat similar but support more options. Not sure if we'll also support yaml. From the json we'll create Location, PVSystem, and ModelChain objects, then pass the data through ModelChain.run_model and new methods in #943. Location.from_json and similar would be easy to add to pvlib. My main concern with putting everything in pvlib is building a broader consensus around the bigger specification.

bt- commented 3 years ago

@wholmgren

I expect the Solar Performance Insight json specification to look somewhat similar but support more options. Not sure if we'll also support yaml. From the json we'll create Location, PVSystem, and ModelChain objects, then pass the data through ModelChain.run_model and new methods in #943.

Yes, you are describing exactly what I'm thinking about. My goal would be that your effort to define the SPI json spec could result in a format that is compatible with yaml and that a user like me could then specify model inputs in a yaml file, put it in a git repository and track changes to the model input assumptions.

I could see that ultimately it might make sense for a specification for a yaml file defining pv energy model inputs to be a separate entity from pvlib. I imagine that might make sense from you perspective as well @mikofski?

mikofski commented 3 years ago

I don't know, maybe? I see at least 2 possible paths:

just make the model spec in any format (yaml, json, etc) in pvlib or SPI and use it to serialize a model
make a new project, call it OpenPV spec, with the spec that pvlib and SPI are using, and possibly invite others to collaborate

There are pros & cons to both apporaches:

this just gets it done, and then others can try to use it if they can, the downside is that it might be incompatible with other models like PVsyst, SAM, PlantPredict, SolarFarmer, etc.
This could drag on forever as every possible possible is discussed and considered, but on the upside, if an OpenPV spec is adopted, you now have a bullet proof way to migrate a model from one software to another. The OpenPV spec could even provide applications (a python library or API) that could convert one format to another, EG: given a pvsyst project archive, make an OpenPV spec, or given a spec, make a PVsyst project archive.

I think something like this should probably fall under Orange Button, right?

cwhanse commented 3 years ago

@mikofski my thought is that we should put a spec in place that meets the needs of pvlib and SPI users, and not worry at this point about compatibility with other platforms. Cross-software compatibility is a great goal, but, I haven't seen any real interest from the various packages to read or write files for other packages and it will take cooperation among the package developers to arrive at a common format.

Orange Button might provide common ground for terminology - however, the current dictionary does not include what would be needed to describe a model chain. Orange Button has moved to an OpenAPI compliant interface which should remove one major hurdle for adoption (using the xBRL standard). There's an OB editor to help compose an OpenAPI schema using Orange Button terms.

mikofski commented 3 years ago

SolarFarmer exports to PVsyst-6.7.5, and we would adopt the pvlib/SPI spec.

cwhanse commented 3 years ago

SolarFarmer exports to PVsyst-6.7.5, and we would adopt the pvlib/SPI spec.

SolarFarmer is being a good citizen in this modeling community 👍

mikofski commented 3 years ago

Hi @bt- I really like your idea of YAML, it goes back and forth to JSON and Python easily, and to be able to use comments is great.

I've gone ahead and created openPVspec based on the gist I linked above (they're sync'd). I'd really like to get your feedback. There are a bunch of areas I haven't filled in quite yet, like compmonent specs for PV modules (like PAN file), inverters (like OND files), transformers (a load/no-load or const. eff. model), and mounting system (racks, trackers, dual-tilt, etc.), Also some AC side interconnect, availability, curtailment, reactive power, a lot more detail that could go in there. Where ever you see etc that means I left out the detail.

There's the concept of "counts" which basically means that instead of rewriting an existing substructure, you can just scale it linearly. Also at present it assumes layouts are rectangular, so if you had an odd layout on a single inverter input, you would have to split it into many rectangular layouts, which could be relaxed by adding more to the schema. There's also the concept of inverter input, which specifies which input in a multi-mppt inverter the layout would use.

Anyway, I'd love to get your feedback. Thanks.

bt- commented 3 years ago

@mikofski, this looks like a really great starting point. Thank you for putting it together! I've only take a quick look so far, but will come back to it in more detail once I've cleared a few other things off my plate. I started a new issue in the openPVspec repository you started with a few initial thoughts.

pvlib / pvlib-python

Energy Production Modeling and Provenance #96