lifewatch / pypam

Python Passive Acoustic Analysis tool for Passive Acoustic Monitoring (PAM)
GNU General Public License v3.0
32 stars 8 forks source link

netcdf output #19

Open ryjombari opened 1 year ago

ryjombari commented 1 year ago

To ensure correct writing of netcdf output, pypam needs to use netcdf4. So examples should include: (1) pip install netcdf4, and (2) format="NETCDF4_CLASSIC" in the to_netcdf call. Without this, the netcdf file was written but not readable by MATLAB or xarray.

The netcdf output currently includes both the original 1-Hz psd and the hybrid millidecade bands. Ultimately, they should probably just contain the hybrid millidecade band data that is the key result to be placed in data repositories.

cparcerisas commented 1 year ago

Hi @ryjombari, thank you for your issue. I propose that we add a better error management or a wrapper to save the output of ASA or AcuFile so that it can be opened. I would keep the xarray output as it is, the user could choose to only save the millidecade bands by doing:

hm = asa.hybrid_millidecade_bands(db=True, method=method, band=band)
hm.to_netcdf(path_to_your_file, format='NETCDF4_CLASSIC')  # This saves the entire dataset, with both original and millidecade
hm['millidecade_bands'].to_netcdf(path_to_your_file, format='NETCDF4_CLASSIC')  # This saves only the millidecade bands

So the user can choose what to store. If we add a wrapper to save the output in utils.py we can add an argument for the user to decide to save all or only certain things?

What do you think @carueda?

carueda commented 1 year ago

Thanks for asking, let me share some quick reactions.

In terms of the included examples, I agree that those additional comments would be useful, in particular the heads-up about potential issues if not using the appropriate output format.

In terms of API, the good thing is that it is already there (in particular, involving pypam and xarray), so the user can select what to output and in what format, as in your snippet above.

Now, perhaps paypam can still more directly support this "only hybrid millidecade bands and netcdf4 output format" use case so it's a bit more straightforward (and less error-prone) for the user. If so, yes, an additional wrapper (and/or some parameters/keywords in relevant pypam functions) would cover that use case.

ryjombari commented 1 year ago

Thanks, both. Great that we can already limit output results to the millidecade bands already. I'll try that to get familiar. I think a wrapper that makes this easy to specify would be helpful.

Would it help to have an example of the expected netcdf content, including required metadata? I am working on that now with Carrie. pypam metadata already look complete for the variables, but there may be some global variables to add.

cparcerisas commented 1 year ago

An example would be great! All the names of the metadata attrs can be changed to better ones so they are easier to understand.