ocean-transport / floater

For working with lagrangian float data
http://floater.readthedocs.io
15 stars 17 forks source link

common Lagrangian data structure #78

Open miniufo opened 3 years ago

miniufo commented 3 years ago

I have been thinking if we need a common Lagrangian type data structure, like the xarray for coordinated n-dimensional dataset, to describe the large number of Lagrangian particles. These data generally involve a time series of positions and associated data along their Lagrangian tracks. Examples are the simulated Lagrangian trajectories here, GDP drifter dataset, Argo float dataset, as well as quasi-Lagrangian tropical cyclone best-track dataset and mesoscale eddy dataset.

So far as I know, pandas.dataframe is used to depict such data, with at least three columns of time, x_pos and y_pos. This is indeed efficient and clear. However, sometimes we need extra information to tie to the dataframe, such as ID, name, type, status etc. So I think we can design a common Lagrangian data structure that all these (quasi) Lagrangian data and associated dataset can be described, accessed, stored, and manipulated efficiently.

A scratch is to define a class of Particle, with ID, name, and records as its fields. Its records is a pandas.DataFrame that stores the Lagrangian data. Through overwritting some of the operators of Particle, we can feature a simple use of Particle like pandas.DataFrame. Through extends, we can further define Drifter, Float, TropicalCyclone subclasses to become more appropriate for each case.

Do you guys have any comment on this?

rabernat commented 3 years ago

This is a great idea.

I would start by reviewing the CF conventions on Trajectory Data. Probably just having data that all conforms to that would be a great start.

Tagging @selipot, who has been thinking about this for GDP data.

selipot commented 3 years ago

Thank you for tagging me here. I have been thinking about this meaning I wrote and submitted a proposal to the NSF EarthCube program to do just that: define a common Lagrangian data structure for the GDP and others. I am hoping to hear in the fall. You can see the extend of the metadata available for the GDP at its ERDDAP server.

rabernat commented 3 years ago

Shane do you think the CF trajectory data / metadata conventions are enough? Or is something more needed?

selipot commented 3 years ago

That's is what (or near) is used right now by the GDP and returned by the OSM ERDDAP server. I am not using these files because I like to have markers for "data gaps" or interruption markers for what are otherwise regular interval time series.

miniufo commented 3 years ago

Glad you guys bring me these information. Hope @selipot get the funding so that we can start the python implementation.

I didn't notice the CF convection but it indeed addresses many of my concerns. Also, I have some experience of using both GDP data and tropical cyclone data. I have tried to abstract the Lagrangian data model as Particle here, where you could also find its subclass as TC or Drifter (or profiling Float). For a set of Particles, I defined a ParticleSet that is equivalent to xarray.Dataset.

Here is a schematic plot: myaa_temp_screendump

I hope that I am in the right path and also that all these concerns can be merged together to shape the Lagrangian data model.

A further thinking is that, one may want to analysis the 3D structure of a mesoscale eddy (or tropical cylone) in a translating cylindrical coordinate. I hope the Lagrangian model could simplify this kind of analysis. Specifically, given a eddy information, I could get the quasi-Lagrangian view of its 3D structure.

Not sure if here is the right place to discuss this. Hope to see a repo for this. Or maybe we could start a session in Pangeo so that I could be updated regularly.