soil-metamodel / decomposition_datasets

2 stars 9 forks source link

Reorg: It lives! #9

Open ktoddbrown opened 7 years ago

ktoddbrown commented 7 years ago

I'm reviving this project. Looking at things with fresh eyes I propose the following reorg:

1) move data sets to separate study 'type' folders (ie incubation vs litterbag) 2) Flatten the ProcessedEnvironment and ProcessedSubstrate to [id, variable, unit, value, uncertainty.type, uncertainty.value] 3) Simplify "ProcessedData" to [sample_id, study_id, substrate_id, environment_id, cap_time, measure_time, time_unit, variable, value, uncertainty.type, uncertainty.value]

If I don't hear any response by 10 December 2016 I'll go forward. -Kathe

natasjaVgestel commented 7 years ago

Hi Kathe, Thanks for picking this back up again!

Natasja

*** Please note the change in my contact info

Natasja van Gestel, PhD Research Assistant Professor Texas Tech University Climate Science Center South Central Climate Science Center Mailstop 3131 Lubbock, TX 79409-3131 Tel: +1 806 742 2715

Research Affiliate Center for Ecosystem Science and Society (Ecoss) Northern Arizona University

www.nvangestel.comhttp://www.nvangestel.com

.. you can fail at what you don’t want, so you might as well take a chance on doing what you love

I forget what I was taught, I only remember what I’ve learned.

On Dec 6, 2016, at 8:47 AM, Kathe Todd-Brown notifications@github.com<mailto:notifications@github.com> wrote:

I'm reviving this project. Looking at things with fresh eyes I propose the following reorg:

  1. move data sets to separate study 'type' folders (ie incubation vs litterbag)
  2. Flatten the ProcessedEnvironment and ProcessedSubstrate to [id, variable, unit, value, uncertainty.type, uncertainty.value]
  3. Simplify "ProcessedData" to [sample_id, study_id, substrate_id, environment_id, cap_time, measure_time, time_unit, variable, value, uncertainty.type, uncertainty.value]

If I don't hear any response by 10 December 2016 I'll go forward. -Kathe

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/soil-metamodel/decomposition_datasets/issues/9, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJVdupEoyFXcbzBGFWGVKzGO7ExDbKscks5rFXWOgaJpZM4LFfOZ.

ecadair commented 7 years ago

Hi Kathe,

I'm teaching a new course this spring, but happy to help where I can.

Carol

On 12/6/16 10:23 AM, natasjaVgestel wrote:

Hi Kathe, Thanks for picking this back up again!

Natasja

*** Please note the change in my contact info

Natasja van Gestel, PhD Research Assistant Professor Texas Tech University Climate Science Center South Central Climate Science Center Mailstop 3131 Lubbock, TX 79409-3131 Tel: +1 806 742 2715

Research Affiliate Center for Ecosystem Science and Society (Ecoss) Northern Arizona University

www.nvangestel.comhttp://www.nvangestel.com

.. you can fail at what you don’t want, so you might as well take a chance on doing what you love

  • Jim Carrey

I forget what I was taught, I only remember what I’ve learned.

  • Patrick White

On Dec 6, 2016, at 8:47 AM, Kathe Todd-Brown notifications@github.com<mailto:notifications@github.com> wrote:

I'm reviving this project. Looking at things with fresh eyes I propose the following reorg:

  1. move data sets to separate study 'type' folders (ie incubation vs litterbag)
  2. Flatten the ProcessedEnvironment and ProcessedSubstrate to [id, variable, unit, value, uncertainty.type, uncertainty.value]
  3. Simplify "ProcessedData" to [sample_id, study_id, substrate_id, environment_id, cap_time, measure_time, time_unit, variable, value, uncertainty.type, uncertainty.value]

If I don't hear any response by 10 December 2016 I'll go forward. -Kathe

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/soil-metamodel/decomposition_datasets/issues/9, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJVdupEoyFXcbzBGFWGVKzGO7ExDbKscks5rFXWOgaJpZM4LFfOZ.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/soil-metamodel/decomposition_datasets/issues/9#issuecomment-265177718, or mute the thread https://github.com/notifications/unsubscribe-auth/AJVUJnDeXzomW1qrhnY2U3omLqFX7DzIks5rFX36gaJpZM4LFfOZ.

crlsierra commented 7 years ago

Hi Kathe, Great to know you want to revive this project. Let me know what I can help. I already have code for model optimization that can be used for this project. You may also want to look at the code that Bob implemented for this type of data using Stan http://mc-stan.org/documentation/case-studies/soil-knit.html

Also, I'm putting together a larger dataset of incubation studies https://github.com/SoilBGC-Datashare/sidb Part of my idea was to test different model structures with these data, maybe we can find ways to work on this together. Carlos

wwieder commented 7 years ago

Fine w/ me, Kathe.

Please let me know if / how I can help?

On Tue, Dec 6, 2016 at 7:47 AM, Kathe Todd-Brown notifications@github.com wrote:

I'm reviving this project. Looking at things with fresh eyes I propose the following reorg:

  1. move data sets to separate study 'type' folders (ie incubation vs litterbag)
  2. Flatten the ProcessedEnvironment and ProcessedSubstrate to [id, variable, unit, value, uncertainty.type, uncertainty.value]
  3. Simplify "ProcessedData" to [sample_id, study_id, substrate_id, environment_id, cap_time, measure_time, time_unit, variable, value, uncertainty.type, uncertainty.value]

If I don't hear any response by 10 December 2016 I'll go forward. -Kathe

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/soil-metamodel/decomposition_datasets/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/AHqLJB3fXL8sFx6vzpyf7Ur_XbME2Pwzks5rFXWNgaJpZM4LFfOZ .

-- Will Wieder Project Scientist CGD, NCAR 303-497-1352

ktoddbrown commented 7 years ago

Thanks everyone for the support!

@crlsierra Yes I think we should definitely talk. I'm with @bob-carpenter and @milkha right now and we are sorting out a first order linear model power analysis using STAN. It looks like sidb is much further along, should we look at combining efforts? Do you have time to talk on Thursday or Friday or AGU (if you are going)?

-Kathe

natasjaVgestel commented 7 years ago

Kathe, I will also be at AGU. Let me know if you’d like to meet. Thursday would work better for me as I’m leaving on Friday.

Natasja van Gestel, PhD Research Assistant Professor Texas Tech University Climate Science Center South Central Climate Science Center Mailstop 3131 Lubbock, TX 79409-3131 Tel: +1 806 742 2715

Research Affiliate Center for Ecosystem Science and Society (Ecoss) Northern Arizona University

www.nvangestel.comhttp://www.nvangestel.com

.. you can fail at what you don’t want, so you might as well take a chance on doing what you love

I forget what I was taught, I only remember what I’ve learned.

On Dec 6, 2016, at 3:12 PM, Kathe Todd-Brown notifications@github.com<mailto:notifications@github.com> wrote:

Thanks everyone for the support!

@crlsierrahttps://github.com/crlsierra Yes I think we should definitely talk. I'm with @bob-carpenterhttps://github.com/bob-carpenter and @milkhahttps://github.com/milkha right now and we are sorting out a first order linear model power analysis using STAN. It looks like sidb is much further along, should we look at combining efforts? Do you have time to talk on Thursday or Friday or AGU (if you are going)?

-Kathe

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/soil-metamodel/decomposition_datasets/issues/9#issuecomment-265273645, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJVdujd2aYbQ5GRJ1fDrGe2kBZw4ywdsks5rFc-kgaJpZM4LFfOZ.

dlebauer commented 7 years ago

I'm interested.

bpbond commented 7 years ago

Hi @ktoddbrown - great! As everyone above said, let me know if/how I can help: data, code, ideas.

crlsierra commented 7 years ago

Hi Kathe, yes, we should try to find ways to combine efforts. I won’t be at AGU this year. I can talk by Skype either today, or the week after AGU. Best, Carlos

On Dec 6, 2016, at 22:12, Kathe Todd-Brown notifications@github.com wrote:

Thanks everyone for the support!

@crlsierra Yes I think we should definitely talk. I'm with @bob-carpenter and @milkha right now and we are sorting out a first order linear model power analysis using STAN. It looks like sidb is much further along, should we look at combining efforts? Do you have time to talk on Thursday or Friday or AGU (if you are going)?

-Kathe

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

dlebauer commented 7 years ago

@crlsierra seems sensible to start with the sidb format rather than creating a bespoke format. Which of these (https://github.com/soil-metamodel/decomposition_datasets/tree/master/incubation) are missing from sidb?

bob-carpenter commented 7 years ago

Milad Kharratzedeh has taken this much further than the initial model I had in that case study. He's basically recreated the design matrix approach for the rates in Carlos's soilR package.

Kathe's been here at Columbia for a day and we're also putting together some nonlinear biomass and enzyme-based models with many more compartments and parameters. But the first thing we want to get going is a power study of how the data collection rate and duration and process and noise affects the ability to recover the parameters of simple multi-compartment models. We'll be looking at complete pooling (assume pool exchange rates are identical across replicates), no pooling (assume each replicate fit independently), and hierarchical partial pooling (letting the data decide where on that spectrum the estimates should be).

Then we'll look at fitting all these real data sets, because we'll have a bunch of models ready to go.

bob-carpenter commented 7 years ago

Also, I'm putting together a larger dataset of incubation studies https://github.com/SoilBGC-Datashare/sidb Part of my idea was to test different model structures with these data, maybe we can find ways to work on this together.

Thanks. That looks great. Fitting 20 data sets presents a process management and presentation issue, but we can start with a few of them. Is there any reason not to just use this sidb data set rather than creating an alternative one?

crlsierra commented 7 years ago

Bob Kathe, and Dave, it’d be great if you use the sidb and also contribute adding new datasets not already there. You would need to follow the specific format for this database, which has all metadata in a .yaml file, and the actual incubation time series in .csv file. The repository is well documented and you can use other entries as examples.

@Bob, I have a relatively similar analysis to what you mentioned here http://www.sciencedirect.com/science/article/pii/S0038071715002813 It would be good to compare to what you find with the power analysis. Best, Carlos

On Dec 7, 2016, at 17:29, Bob Carpenter notifications@github.com wrote:

Also, I'm putting together a larger dataset of incubation studies https://github.com/SoilBGC-Datashare/sidb Part of my idea was to test different model structures with these data, maybe we can find ways to work on this together.

Thanks. That looks great. Fitting 20 data sets presents a process management and presentation issue, but we can start with a few of them. Is there any reason not to just use this sidb data set rather than creating an alternative one? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bob-carpenter commented 7 years ago

Thanks for the pointer to the paper.

I saw the DB markup required. I like the idea of using something consistent. Is there a way to parse that metadata in R or is it just meant to be read by humans?

crlsierra commented 7 years ago

The yaml file is both human and machine readable. you can use R function yaml.load_file from package yaml, which loads the file as an R list. See file loadEntries.R in the scripts folder of the sidb repo to see how to parse both data and metadata into a single list. Check also the Summary.Rmd file in the Demo folder.

On Dec 7, 2016, at 17:44, Bob Carpenter notifications@github.com wrote:

Thanks for the pointer to the paper.

I saw the DB markup required. I like the idea of using something consistent. Is there a way to parse that metadata in R or is it just meant to be read by humans? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ktoddbrown commented 7 years ago

@crlsierra Do you have the scripts to process the original raw data into the common data format or was that hand curated?

crlsierra commented 7 years ago

Kathe, We are adding entries by hand, taking the information directly from the papers. We have no scripts to automate the translation of the metadata. Carlos

On Dec 8, 2016, at 20:18, Kathe Todd-Brown notifications@github.com wrote:

@crlsierra Do you have the scripts to process the original raw data into the common data format or was that hand curated?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.