Open miniufo opened 3 years ago
This is a great idea.
I would start by reviewing the CF conventions on Trajectory Data. Probably just having data that all conforms to that would be a great start.
Tagging @selipot, who has been thinking about this for GDP data.
Thank you for tagging me here. I have been thinking about this meaning I wrote and submitted a proposal to the NSF EarthCube program to do just that: define a common Lagrangian data structure for the GDP and others. I am hoping to hear in the fall. You can see the extend of the metadata available for the GDP at its ERDDAP server.
Shane do you think the CF trajectory data / metadata conventions are enough? Or is something more needed?
That's is what (or near) is used right now by the GDP and returned by the OSM ERDDAP server. I am not using these files because I like to have markers for "data gaps" or interruption markers for what are otherwise regular interval time series.
Glad you guys bring me these information. Hope @selipot get the funding so that we can start the python implementation.
I didn't notice the CF convection but it indeed addresses many of my concerns. Also, I have some experience of using both GDP data and tropical cyclone data. I have tried to abstract the Lagrangian data model as Particle
here, where you could also find its subclass as TC
or Drifter
(or profiling Float
). For a set of Particle
s, I defined a ParticleSet
that is equivalent to xarray.Dataset
.
Here is a schematic plot:
I hope that I am in the right path and also that all these concerns can be merged together to shape the Lagrangian data model.
A further thinking is that, one may want to analysis the 3D structure of a mesoscale eddy (or tropical cylone) in a translating cylindrical coordinate. I hope the Lagrangian model could simplify this kind of analysis. Specifically, given a eddy information, I could get the quasi-Lagrangian view of its 3D structure.
Not sure if here is the right place to discuss this. Hope to see a repo for this. Or maybe we could start a session in Pangeo so that I could be updated regularly.
I have been thinking if we need a common Lagrangian type data structure, like the xarray for coordinated n-dimensional dataset, to describe the large number of Lagrangian particles. These data generally involve a time series of positions and associated data along their Lagrangian tracks. Examples are the simulated Lagrangian trajectories here, GDP drifter dataset, Argo float dataset, as well as quasi-Lagrangian tropical cyclone best-track dataset and mesoscale eddy dataset.
So far as I know, pandas.dataframe is used to depict such data, with at least three columns of time, x_pos and y_pos. This is indeed efficient and clear. However, sometimes we need extra information to tie to the
dataframe
, such as ID, name, type, status etc. So I think we can design a common Lagrangian data structure that all these (quasi) Lagrangian data and associated dataset can be described, accessed, stored, and manipulated efficiently.A scratch is to define a class of
Particle
, with ID, name, and records as its fields. Its records is apandas.DataFrame
that stores the Lagrangian data. Through overwritting some of the operators ofParticle
, we can feature a simple use ofParticle
likepandas.DataFrame
. Throughextends
, we can further defineDrifter
,Float
,TropicalCyclone
subclasses to become more appropriate for each case.Do you guys have any comment on this?