kaylai / VESIcal

A generalized python library for calculating and plotting various things related to mixed volatile (H2O-CO2) solubility in silicate melts.
MIT License
27 stars 9 forks source link

Complex handling of calibration regions #80

Open kaylai opened 4 years ago

kaylai commented 4 years ago

This is certainly for a future version of the code, but I have some thoughts about how to handle calibration ranges in a more complex way. I'm putting these thoughts into words now in case we decide to address them later.

Scenario Perhaps we have a model whose paper says that it is calibrated "up to 75 wt% SiO2 and up to 30 wt% CaO". Obviously, this does not apply to both of those maximum values at the same time. In fact, where does it apply? Can I run a sample with 75 wt% SiO2 and 25 wt% CaO?

To make this more complicated... Part of the problem in solving this is that it is not necessarily possible for VESIcal developers to appropriately interpret the precise meanings of the reported calibration ranges. We can simply use the calibration dataset and ignore what is written in the paper, although these may not be equivalent. Perhaps the calibration dataset only goes to 20 wt% CaO, but the authors had some reason to feel that it was fine to extrapolate to 30 wt% CaO. It's never easy to know.

All of that said, how might we address this? One way might be to create a functionality only tangentially related to how warnings are handled, which calculates how far in P-T-X space a user's sample is from the calibration dataset. This at least gives the user some way to assess how "far" outside of the calibration range their data are, and could be implemented in some example jupyter notebook where users can perform checks.

The Convex Hull Ideally, we would have a complete calibration dataset containing the P, T, and X of all samples used for the calibration. In this case, we could plot each calibration sample as a point in n-dimensional space (where n is the number of parameters P, T, and all oxides). We could then calculate the convex hull, or the n-dimensional shape that most closely wraps around all of these points. The distance in each dimension from the user's data to the hull gives you some gauge of how far their data lie outside (or inside) of the calibrated range.

Mark and Aaron over at ENKI have implemented this technique for defining an omni-component phase in a system. If this implementation could be borrowed from ENKI for this purpose, it could be a good solution for this issue.

simonwmatthews commented 2 years ago

Just browsing the VESIcal repo and saw this. I think it is a great idea! The Shapely python library has some tools for calculating convex hulls, and so it may be very straightforward to implement (without having to borrow the more complex ENKI code).