org-arl / UnderwaterAcoustics.jl

Julia toolbox for underwater acoustic modeling
MIT License
44 stars 13 forks source link

I/O proposal based on netcdf #36

Open apatlpo opened 3 years ago

apatlpo commented 3 years ago

This PR follows from #33 and aims at brainstorming what a netcdf based I/O implementation could look like.

Bare in mind these are my first attempts at coding in Julia.

I'll edit the PR header with a proper to do list if we deem this effort relevant.

Basic usage would look like:

# store
env = UnderwaterEnvironment()
store(env, "myenv.nc", mode="c")

# load
env = load_environment("myenv.nc")

Properties the implementation should respect:

mchitre commented 3 years ago

I've not used the NetCDF format before. From what I see, it seems pretty generic, which also means there are other competitor formats that would warrant consideration. So a few questions to educate myself before I comment on the PR:

  1. Is it a common format in the geoscience field? Are there standard fields etc for specific types of data that are relevant to building an UnderwaterEnvironment?
  2. Do you envision storing the entire environment as a NetCDF file? Or specific fields only?
  3. What does mode = "c" in your proposed usage above stand for?
apatlpo commented 3 years ago
  1. The use of NetCDF is overwhelming in environmental sciences as stated in the wikipedia netcdf page. The two main reasons are self-description and multi-dimensional capability to my opinion, the latter being an important reason why I believe it is also relevant for UnderwaterAcoustics. As a side note xarray can be seen as a in-memory extension of netcdf-like data. There is such a thing as the Climate and Forecast metadata convention which is broadly applied in environmental sciences, see this pdf for a mode synthetic description. One could consider following the convention if you this is reasonable. Following the convention may performed in a subsequent PR.

  2. Yes. I considered only altimetry right now in order to draft the implementation.

  3. The mode enables either the creation of a new netcdf file or appending to an existing one. See NCDatasets.jl doc One may consider for example appending the environment to an existing file that may contain other type of information (model configuration, sources, receiver data, ...).

mchitre commented 3 years ago

The example usage suggests storing the whole environment description in a .nc file. I'm wondering if it might be better to store individual fields such as bathymetry, altimetry, etc as individual NetCDF files? If the goal is to have compatibility with existing files that the community uses, then it's reading/writing those files that one would need, and not the whole environment description, right? There's no other software that uses the full environment description that we use in this package.

Maybe I'm not fully clear on the use case ... could you walk me through a simple scenario where I might find this interop with NetCDF useful if I were a geoscientist?

Another things to consider when reading/writing files in various formats is to look at FileIO.jl to see if it makes sense to use its API for this.

apatlpo commented 3 years ago

The most likely workflow we will adopt in the short term is the following one:

  1. Output netcdf files from a fluid flow numerical model is preprocessed in order to generate inputs for an acoustical simulation. This preprocessing will be performed in python because all of our tools are written with it and transcoding is out of question in the short term.
  2. we call UA (UnderwaterAcoustics) in order to perform the acoustical simulation and store results.
  3. post-processing of the acoustical data is performed in python. The reason for using python here is again is that the acoustical data will be crossed with other type of data with existing tools that we cannot afford to transcode.

One of the incentive behind using netcdf is that it is language agnostic. In 1. for example, the fluid flow numerical code is most of the time written in fortran, it uses netcdf files produced with python and outputs netcdf files which we post-process with python.

It's probably a good idea to write I/O that are not too dependent on a given file format and I should adjust this PR to meet this objective.

I like the idea behind FileIO but netcdf is not part of the languages supported. I wonder whether there is a fundamental reason for that btw.

I agree that we should allow for individual input fields (bathymetry, ...) being stored in separate files. I don't think it would be very difficult to adjust the code in order to do that.