mmaelicke / scikit-gstat

Geostatistical variogram estimation expansion in the scipy style
https://mmaelicke.github.io/scikit-gstat/
MIT License
225 stars 53 forks source link

Support for Spatial-Temporal variograms #126

Closed Iram-stack closed 2 years ago

Iram-stack commented 2 years ago

I have piezometers at different points at the same site.I want to find there correlation among each other and also with rainfall.I want to do spatial Temporal correlation analysis.I am unable to find any code related to it.Anyone help me in this matter.

At the moment i know about spatial variogram but not spatial-temporal variogram.

mmaelicke commented 2 years ago

Hey,

Unfortunately, there is no full step-by-step tutorial for spatio-temporal variogram estimation. There are some samples scattered across the docs:

STV = skg.SpaceTimeVariogram(coords, vals[:,::6], x_lags=20, t_lags=20, model='product-sum')
print(STV)

The coordinates are passed just like for the spatial variogram, the values are expected to be 2D numpy array of shape (len(coords), timestamps)

Please also note, that no space-time kriging is available. Contributions welcome... ;)

If you have any more specific questions, please go ahead and ask.

Best, Mirko


Fersch, Benjamin, et al. “A dense network of cosmic-ray neutron sensors for soil moisture observation in a pre-alpine headwater catchment in Germany.” Earth System Science Data Discussions 2020 (2020): 1-35.

Iram-stack commented 2 years ago

Hi mmaelicke!

Thankyou for sharing. I want to know the difference between marginal variogram and variogram. I only know about variogram.

Best regards, Iram


From: Mirko Mälicke @.> Sent: Wednesday, April 27, 2022 12:08 PM To: mmaelicke/scikit-gstat @.> Cc: Iram-stack @.>; Author @.> Subject: Re: [mmaelicke/scikit-gstat] Support for Spatial-Temporal variograms (Issue #126)

Hey,

Unfortunately, there is no full step-by-step tutorial for spatio-temporal variogram estimation. There are some samples scattered across the docs:

STV = skg.SpaceTimeVariogram(coords, vals[:,::6], x_lags=20, t_lags=20, model='product-sum')

print(STV)

The coordinates are passed just like for the spatial variogram, the values are expected to be 2D numpy array of shape (len(coords), timestamps)

Please also note, that no space-time kriging is available. Contributions welcome... ;)

If you have any more specific questions, please go ahead and ask.

Best, Mirko


Fersch, Benjamin, et al. “A dense network of cosmic-ray neutron sensors for soil moisture observation in a pre-alpine headwater catchment in Germany.” Earth System Science Data Discussions 2020 (2020): 1-35.

— Reply to this email directly, view it on GitHubhttps://github.com/mmaelicke/scikit-gstat/issues/126#issuecomment-1110625522, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AY4XCKZPMYIZZ74W242XGUDVHDRXBANCNFSM5ULMKTWA. You are receiving this because you authored the thread.Message ID: @.***>

mmaelicke commented 2 years ago

OK. There is a lot of literature out there, I personally enjoyed Montero's book about geostatistical modeling at the time a lot: https://bcs.wiley.com/he-bcs/Books?action=index&bcsId=9800&itemId=1118413180

Some papers, but I am not sure, how detailed the method of the spatio-temporal variogram itself is described:


In a nutshell: The spatial, classical variogram is based on distance lags only. The spatio-temporal variogram adds the temporal lags as another dimension, thus you end up with a meshgrid (or surface in the continous case) for the variogram function. To build this model, the family of separable models needs the marginal variograms. These are conceptually similar to marginal distributions of histograms.
For the spatial one, you take only these point pairs into account, which have the temporal lag set to zero, for the temporal you set the spatial lag to zero. Then you calculate the variogram as before, just with much more data.

Note that the marginal spatial variogram is different from the spatial variogram, as it is still calculated for the full spatio-temporal dataset. It is like a second-order moment, but at any time. (Not sure if this is mathematically speaking strictly true, but I like to think of it this way... :) )

mmaelicke commented 2 years ago

And of course there is also the work by Christakos and De Cesare

https://books.google.com/books?hl=de&lr=&id=6CmJk7yPMbwC&oi=fnd&pg=PP10&ots=ekH0OJrjX6&sig=79MBdxEj_4_cNQ6Yi0TGw8nHs-s

https://www.sciencedirect.com/science/article/pii/S0167715200001310?casa_token=NZ_i7ryAzHoAAAAA:_rnSnOkdJaZ8zDLtVh1Tf4dORWm1XP8K8xtq3SHNndCq3RifKbBSxFVGQpdKwCRG6VHrgQC-6w

Iram-stack commented 2 years ago

Hi Mirko Malicke!

I hope you are in good health. I am confused in arranging the data set for spatial temporal variogram analysis.

I have 12 piezometers, it means i have 12 points having X,Y coordinates and each point have data of 2 years.

In order to find the relationship between these piezometers with respect to space and time. How can i arrange data? For example, i am thinking to arrange data in excel like this 1 col= X coordinates 2 col = Y coordinates 3 col = Piezometers name 4 col =. Ground water level 5 col = Time of (dd/mm/yy)

In my data 12 spatial points are available, the data is more w.r.t to time.

Is it better i can separately analyze the data. make data set for spatial first then for temporal?

Best regards, Iram


From: Mirko Mälicke @.> Sent: Thursday, April 28, 2022 8:46 PM To: mmaelicke/scikit-gstat @.> Cc: Iram-stack @.>; Author @.> Subject: Re: [mmaelicke/scikit-gstat] Support for Spatial-Temporal variograms (Issue #126)

And of course there is also the work by Christakos and De Cesare

https://books.google.com/books?hl=de&lr=&id=6CmJk7yPMbwC&oi=fnd&pg=PP10&ots=ekH0OJrjX6&sig=79MBdxEj_4_cNQ6Yi0TGw8nHs-s

https://www.sciencedirect.com/science/article/pii/S0167715200001310?casa_token=NZ_i7ryAzHoAAAAA:_rnSnOkdJaZ8zDLtVh1Tf4dORWm1XP8K8xtq3SHNndCq3RifKbBSxFVGQpdKwCRG6VHrgQC-6w

— Reply to this email directly, view it on GitHubhttps://github.com/mmaelicke/scikit-gstat/issues/126#issuecomment-1112368673, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AY4XCK6TZQC64WQAQQ7O45LVHKXGZANCNFSM5ULMKTWA. You are receiving this because you authored the thread.Message ID: @.***>

mmaelicke commented 2 years ago

Hey,

I just saw that the docs for SpaceTimeVariogram are not great. I just updated the docs and the parameters are now all described.

The coordinates have to be passed just like for the variogram instance, ie the points (0, 1), (1, 2), (2, 3) would be passed like:

import numpy as np
np.array([
  [0, 0],
  [1, 2],
  [2, 3]
])

These would have to be build from your 1,2 columns. Assuming you have the data in a pandas.DataFrame you can do:

coords = df.iloc[:,:2].values

For the values, a bit more of data processing is needed. Assuming your timeseries are X long, the values array needs to have the shape (12, X) for your data. So, for each coordinate you need a row of ground water levels. Given the info about your data, I guess that the primary key for a data-point is the combination of columns 3 and 5. My strategy here is to group the data by column 3 and extract column 4 for each group. Then you can np.column_stack the single arrays into a matrix of shape (12, X). Maybe its easier to do the preprocessing in excel. That's up to you.

As a final thought: Before you dive deep into data processing and building complex spatio-temporal variograms, I would check if 12 observation locations is enough to capture a spatial correlation at all. Can you build enough spatial lags? Can you find a clear sill and most important: How big is the empirical nugget/sill ratio. My personal concern would be that groundwater levels show a substantial auto-correlation, which might make it quite complicated to capture the spatial counterpart based on only 12 location.

Hope this helps...

Iram-stack commented 2 years ago

Dear Mirko Malicke,

I agree with you that before doing spatial-temporal analysis it is better to check either the data is suitable for spatial variogram or not?

I have attached image, it gives you some idea.

Best regards, Iram


From: Mirko Mälicke @.> Sent: Thursday, May 12, 2022 12:15 PM To: mmaelicke/scikit-gstat @.> Cc: Iram-stack @.>; Author @.> Subject: Re: [mmaelicke/scikit-gstat] Support for Spatial-Temporal variograms (Issue #126)

Hey,

I just saw that the docs for SpaceTimeVariogramhttps://mmaelicke.github.io/scikit-gstat/reference/spacetimevariogram.html#skgstat.SpaceTimeVariogram are not great. I just updated the docs and the parameters are now all described.

The coordinates have to be passed just like for the variogram instance, ie the points (0, 1), (1, 2), (2, 3) would be passed like:

import numpy as np np.array([ [0, 0], [1, 2], [2, 3] ])

These would have to be build from your 1,2 columns. Assuming you have the data in a pandas.DataFrame you can do:

coords = df.iloc[:,:2].values

For the values, a bit more of data processing is needed. Assuming your timeseries are X long, the values array needs to have the shape (12, X) for your data. So, for each coordinate you need a row of ground water levels. Given the info about your data, I guess that the primary key for a data-point is the combination of columns 3 and 5. My strategy here is to group the data by column 3 and extract column 4 for each group. Then you can np.column_stack the single arrays into a matrix of shape (12, X). Maybe its easier to do the preprocessing in excel. That's up to you.

As a final thought: Before you dive deep into data processing and building complex spatio-temporal variograms, I would check if 12 observation locations is enough to capture a spatial correlation at all. Can you build enough spatial lags? Can you find a clear sill and most important: How big is the empirical nugget/sill ratio. My personal concern would be that groundwater levels show a substantial auto-correlation, which might make it quite complicated to capture the spatial counterpart based on only 12 location.

Hope this helps...

— Reply to this email directly, view it on GitHubhttps://github.com/mmaelicke/scikit-gstat/issues/126#issuecomment-1124617975, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AY4XCKZP46IBEMDINGMMLJ3VJSVXTANCNFSM5ULMKTWA. You are receiving this because you authored the thread.Message ID: @.***>

mmaelicke commented 2 years ago

Hey @Iram-stack,

There is no image attached. Maybe it got lost when replying to Github issue subscriptions by mail. You can also drag&drop images directly into the editor on Github.

I have the feeling that we are already way beyond a technical support of scikit-gstat. This is more of geostatistical consulting right now. Please go ahead and post that image, but please understand that is not my top priority in all the voluntary work related to scikit-gstat.

If there is something scientifically interesting about your study or you are a student and use this for any kind of assignment, you can also reach out to my university email address at KIT: mirko.maelicke@kit.edu and maybe there is an opportunity to collaborate within the scope of my role as a researcher and educator. In case you need this for any kind of engineering task in a company, you can also reach out to my company email: mirko@hydrocode.de. In this scope, we actually do professional data-science related consulting.

Best, Mirko