soil-metamodel / stan

Stan models and associated R code
2 stars 4 forks source link

Model respecting non-negative constraint on concentrations #7

Closed bob-carpenter closed 10 years ago

bob-carpenter commented 10 years ago

There's a physical positivity constraint on concentrations which suggests that the noise could be modeled as lognormal instead of normal. That makes the error model multiplicative rather than additive, which makes sense if the error is proportional to measurement in the experimental range we have (0.3 to 7.5 eCO2mean in the SoilR data set).

Alternatively, It may also make sense to think that there could be negative measurements because the initial carbon is itself only an estimate and a measurement might indicate more carbon was evolved than was in the system initially. You see this kind of inconsistency in the AK_T25 data, which has some points at which the measured evolved carbon exceeds the initial carbon.

ktoddbrown commented 10 years ago

Dirt diggers feel free to chime in here. But my understanding is that soil density has a log-normal measurement error associated with it (?) where as the % carbon has a normal distribution. Thus making the resulting initial soil organic carbon density not exactly normal. Does that answer your question?

My brain is breaking trying to think of negative carbon. I don't think this is possible.

You really shouldn't see more carbon released then initially measured, some (likely most) of the carbon will always stick around in an incubation. This may indicated a problem with the data set, possibly in the conversion between CO2 concentration and the amount of C this represents relative to the initial SOC.

bob-carpenter commented 10 years ago

Thanks for all the quick feedback!

Is soil density the initial carbon?

Is % carbon the concentration of evolved carbon?

On Nov 26, 2014, at 4:00 PM, Kathe Todd-Brown notifications@github.com wrote:

Dirt diggers feel free to chime in here. But my understanding is that soil density has a log-normal measurement error associated with it (?) where as the % carbon has a normal distribution. Thus making the resulting initial soil organic carbon density not exactly normal. Does that answer your question?

My brain is breaking trying to think of negative carbon. I don't think this is possible.

You really shouldn't see more carbon released then initially measured, some (likely most) of the carbon will always stick around in an incubation. This may indicated a problem with the data set, possibly in the conversion between CO2 concentration and the amount of C this represents relative to the initial SOC.

— Reply to this email directly or view it on GitHub.

ktoddbrown commented 10 years ago

My understanding is that initial carbon is generally calculated as [kg-C m^-3] This is composed of two separate measurements: the soil density [kg-soil m^-3], and the carbon % [kg-C kg-soil^-1]. Frequently evolved carbon is normalized to the initial carbon of the incubation but it's measured as a molar % [mmol-CO2 mmol-air^-1 day^-1] and has to be converted to a carbon mass and then related to the soil carbon stock based on the ratio of head space volume to soil volume.

Does that make sense?

bob-carpenter commented 10 years ago

Thanks --- that was just the kind of info I was curious about.

I understand all the units, but don't undertand why there's a day^-1 term in the evolved carbon units. Is what's really being measured the carbon evolved since the last measurement? It's going to be related back to an instaneous rate in the equations, right?

I haven't done chemistry in a while. Is mmol just a milli-mole? Not that it matters here because it'll cancel.

I think it would make sense to express the measurement errors on their own scales and convert back, but unless we have independent data for all these measurements, that won't make sense and we'll have to work from the aggregate.

That is, instead of putting an error on measuring initial carbon, you can have a measurement error model for soil density, with a true soil density being a latent parameter and the measurement coming with some error. Similarly for the carbon %. Density will be positive, carbon% will be in (0,1), so we can have some appropriate error models. Then the errors combine in the obvious way --- initial carbon is a function of the latent parameters and the errors compound.

That way, you can model the two measurement processes --- soil density and carbon%. This only really makes sense if there are replicated measurements to get some estimate of the errors, or if the measurement error has a known form we can assume based on calibrating the devices.

On Nov 26, 2014, at 4:46 PM, Kathe Todd-Brown notifications@github.com wrote:

My understanding is that initial carbon is generally calculated as [kg-C m^-3] This is composed of two separate measurements: the soil density [kg-soil m^-3], and the carbon % [kg-C kg-soil^-1]. Frequently evolved carbon is normalized to the initial carbon of the incubation but it's measured as a molar % [mmol-CO2 mmol-air^-1 day^-1] and has to be converted to a carbon mass and then related to the soil carbon stock based on the ratio of head space volume to soil volume.

Does that make sense?

— Reply to this email directly or view it on GitHub.

crlsierra commented 10 years ago

Bob, you point out an important issue with this dataset. It's indeed impossible to respire more carbon than the initial amount there. Usually in these incubations, the respired C is a very small proportion of the initial carbon. There's indeed an error with our data. I just went and looked at the scripts from the student who did the work and found a bug in her calculations, which probably will produce much smaller numbers for the respired C. Although this bug produce unrealistic numbers by a constant proportion, it shouldn't matter much for the algorithm development you are doing. While I fix this bug and produce a new dataset with the appropriate units, I would recommend you multiply the respiration data by a small proportion, probably around 0.0001 will give you realistic numbers. This way you don't have to worry too much about the problem of negative C numbers and the distribution of the error.

bob-carpenter commented 10 years ago

Answered, thanks.

bob-carpenter commented 10 years ago

It's not just measurement error?

Definitely not a problem for algorithm development. I produced a data simulator I can fit and this is just a hello world kind of example to show people how to use Stan --- the same role as your vignette, actually, with some more background for non-biogeochemists on what is being modeled.

The real question is whether error in respired C measurement is likely to be proportional to the amount of respired C or more likely to be constant throughout the range of measurements.

On Nov 27, 2014, at 7:30 AM, Carlos A. Sierra notifications@github.com wrote:

Bob, you point out an important issue with this dataset. It's indeed impossible to respire more carbon than the initial amount there. Usually in these incubations, the respired C is a very small proportion of the initial carbon. There's indeed an error with our data. I just went and looked at the scripts from the student who did the work and found a bug in her calculations, which probably will produce much smaller numbers for the respired C. Although this bug produce unrealistic numbers by a constant proportion, it shouldn't matter much for the algorithm development you are doing. While I fix this bug and produce a new dataset with the appropriate units, I would recommend you multiply the respiration data by a small proportion, probably around 0.0001 will give you realistic numbers. This way you don't have to worry too much about the problem of negative C numbers and the distribution of the error.

— Reply to this email directly or view it on GitHub.