lkilcher / dolfyn

A library for oceanographic doppler instruments such as Acoustic Doppler Profilers (ADPs, ADCPs) and Acoustic Doppler Velocimeters (ADVs).
BSD 3-Clause "New" or "Revised" License
42 stars 25 forks source link

Create PR for dolfyn-xarray! #67

Closed lkilcher closed 2 years ago

jmcvey3 commented 2 years ago

I'm curious why the .ad2cp files have changed at all?

Have they? I think I had some issue with them not getting stored in git lfs and had to reupload to fix that

jmcvey3 commented 2 years ago

Also, if you retract the latest 2 commits on your master branch, everything here should be able to merge. I attempted to pull those in a few months ago but had no success.

Those two commits, "Bugfix for reading timestamps" in your master I fixed through "TRDI VMDAS file read fix" back on Mar 19, and test files, well, they got updated with every code update.

lkilcher commented 2 years ago

Also, if you retract the latest 2 commits on your master branch, everything here should be able to merge. I attempted to pull those in a few months ago but had no success.

Those two commits, "Bugfix for reading timestamps" in your master I fixed through "TRDI VMDAS file read fix" back on Mar 19, and test files, well, they got updated with every code update.

OK. Good to know about the TRDI VMDAS file read fix. As for test files: I really want to walk through all of those before we merge this. I want to know: if any data in the test files changed, why did it change?

lkilcher commented 2 years ago

Also, I'm wondering: how are we handling timestamps? When I read a .ad2cp file, I'm seeing the following:

<xarray.DataArray 'time' (time: 683043)>
array([1.625166e+09, 1.625166e+09, 1.625166e+09, ..., 1.625336e+09,
       1.625336e+09, 1.625336e+09])
Coordinates:
  * time     (time) float64 1.625e+09 1.625e+09 ... 1.625e+09 1.625e+09
Attributes:
    description:  seconds since 1970-01-01 00:00:00

But that doesn't seem particularly convenient to me. How do I get these numbers into human-readable values?

lkilcher commented 2 years ago

I'm also not able to save Signature Data files as netcdf. For example:

>>> dat = dolfyn.read('signature_500_file.ad2cp')
>>> dat.to_netcdf('testfile.nc')

TypeError: Invalid value for attr 'filehead_config'

Removing filehead_config from dat.attrs gives the same error for burst_config, and then again for burst_config_b5. So, I think we need to figure out how to format the data in those attrs so that we don't get these errors.

... then after removing all of those attributes I get an error having to do with trying to write a list of strings. Even though this page says that xarray implemented writing lists of strings to attrs, I don't see it working!

lkilcher commented 2 years ago

I'm also not able to save Signature Data files as netcdf. For example:

>>> dat = dolfyn.read('signature_500_file.ad2cp')
>>> dat.to_netcdf('testfile.nc')

TypeError: Invalid value for attr 'filehead_config'

Removing filehead_config from dat.attrs gives the same error for burst_config, and then again for burst_config_b5. So, I think we need to figure out how to format the data in those attrs so that we don't get these errors.

... then after removing all of those attributes I get an error having to do with trying to write a list of strings. Even though this page says that xarray implemented writing lists of strings to attrs, I don't see it working!

OK. I figured out the last bug here. I just needed to be using the NETCDF4 format/engine (not scipy). But the original issue of not being able to write filehead_config, burst_config, and burst_config_b5 still exists.

lkilcher commented 2 years ago

...and I figured out the problem with the other entries. They are all dictionaries, which is not supported as attrs. So what are we going to do about this? Should we just leave this data as a big long ugly string? Or write it to a file somewhere? Or json-ify it?

At the moment I'm inclined to json-ify it. That way it is easy to recover if necessary, and it's not a problem now. Thoughts?

lkilcher commented 2 years ago

 Also: I need write access to this pr.

jmcvey3 commented 2 years ago

...and I figured out the problem with the other entries. They are all dictionaries, which is not supported as attrs. So what are we going to do about this? Should we just leave this data as a big long ugly string? Or write it to a file somewhere? Or json-ify it?

At the moment I'm inclined to json-ify it. That way it is easy to recover if necessary, and it's not a problem now. Thoughts?

That's why I wrote the dolfyn.save() function in the IO api. There are a number of things netcdf cannot save. Is there a time this week we can talk over things that works for you?

jmcvey3 commented 2 years ago

 Also: I need write access to this pr.

I think you created this PR? Not sure how to do that

jmcvey3 commented 2 years ago

Also, I'm wondering: how are we handling timestamps? When I read a .ad2cp file, I'm seeing the following:

<xarray.DataArray 'time' (time: 683043)>
array([1.625166e+09, 1.625166e+09, 1.625166e+09, ..., 1.625336e+09,
       1.625336e+09, 1.625336e+09])
Coordinates:
  * time     (time) float64 1.625e+09 1.625e+09 ... 1.625e+09 1.625e+09
Attributes:
    description:  seconds since 1970-01-01 00:00:00

But that doesn't seem particularly convenient to me. How do I get these numbers into human-readable values?

I know. Matlab datenum isn't either. https://dolfyn-xarray.readthedocs.io/en/latest/apidoc/dolfyn.time.html

lkilcher commented 2 years ago

Also: I need write access to this pr.

I think you created this PR? Not sure how to do that

I think you need to give me write access to whichever branch this is pulling in (your master in this case).

lkilcher commented 2 years ago

Also, I'm wondering: how are we handling timestamps? When I read a .ad2cp file, I'm seeing the following:

<xarray.DataArray 'time' (time: 683043)>
array([1.625166e+09, 1.625166e+09, 1.625166e+09, ..., 1.625336e+09,
       1.625336e+09, 1.625336e+09])
Coordinates:
  * time     (time) float64 1.625e+09 1.625e+09 ... 1.625e+09 1.625e+09
Attributes:
    description:  seconds since 1970-01-01 00:00:00

But that doesn't seem particularly convenient to me. How do I get these numbers into human-readable values?

I know. Matlab datenum isn't either. https://dolfyn-xarray.readthedocs.io/en/latest/apidoc/dolfyn.time.html

Agree. I was hoping that we could use the stuff shown here to get xarray to handle timedata: http://xarray.pydata.org/en/stable/user-guide/time-series.html

jmcvey3 commented 2 years ago

Also, I'm wondering: how are we handling timestamps? When I read a .ad2cp file, I'm seeing the following:

<xarray.DataArray 'time' (time: 683043)>
array([1.625166e+09, 1.625166e+09, 1.625166e+09, ..., 1.625336e+09,
       1.625336e+09, 1.625336e+09])
Coordinates:
  * time     (time) float64 1.625e+09 1.625e+09 ... 1.625e+09 1.625e+09
Attributes:
    description:  seconds since 1970-01-01 00:00:00

But that doesn't seem particularly convenient to me. How do I get these numbers into human-readable values?

I know. Matlab datenum isn't either. https://dolfyn-xarray.readthedocs.io/en/latest/apidoc/dolfyn.time.html

Agree. I was hoping that we could use the stuff shown here to get xarray to handle timedata: http://xarray.pydata.org/en/stable/user-guide/time-series.html

Yes, it was on my "this would be nice to have list". I spent probably at least 50% of my time on testing, which was more than I had planned, and just ran out of time.

lkilcher commented 2 years ago

OK. Rolled lkilcher/master back to d1ef15e so that this can merge cleanly. Now I still want to do some checking/testing to make sure that the test data is consistent.

jmcvey3 commented 2 years ago

OK. Rolled lkilcher/master back to d1ef15e so that this can merge cleanly. Now I still want to do some checking/testing to make sure that the test data is consistent.

Sounds good, and hopefully it's not too difficult, given that I changed variable names to be consistent across instruments. I'm pretty confident the test data is consistent, except for the Signature test data. That I have tested against Nortek mat files (which was the next thing on my todo list for the other instrument types before my time ran out, unfortunately)

lkilcher commented 2 years ago

@jmcvey3

I created this repo to walk through comparing data from dolfyn 0.12, and your xarray version. Do you have bandwidth to try setting it up?

Once you're setup, I was thinking we can talk about the discrepancies here. Most of the discrepancies at the moment are due to the variables you dropped. Let me know when that is fixed and I'll create an updated run_all-output.txt file.

lkilcher commented 2 years ago

💥