A103 xarray contains duplicate timestamps

glugeorge commented 1 year ago

After loading in the xarray data for A103, we see that the unique timestamps are limited to each day, using this command: ds_103.time.drop_duplicates(dim='time'). The other datasets preserve the same amount of time steps ~11000 under this command whereas ds_103 drops from that to 127. This issue must stem from before the conversion to a zarr, and potentially even from loading the raw .DATs. I can verify (from the harddrive with the original dats) that it's not an issue with the raw data to start. I will track down exactly where these duplicate dates come in.

Resolving this is an important step in being able to properly plot profiles (and thus velocities) for all stations.

glugeorge commented 1 year ago

Ok I've isolated this and mostly solved it. https://github.com/pydata/xarray/issues/5969 here faces a similar issue. Essentially from what I can tell, the encoding for the time array for A103 was set to 'days since __' with __ being the timestamp from the first burst. Then, when converting to zarr, it only preserved the day associated with the burst. I manually set the encoding for the date array in A103 to 'seconds since ' and it seems to be good now. The updated A103 data can be accessed under the sitename 'A103_fixed'. I'll also create a notebook that describes the modified process for saving A103 to zarr.

For understanding why this happened only with A103 and not the other two sites, I reckon it's because A103 has test data from february whereas the other two start in may. So I think when the A103 data was initially chunked, the time array was automatically set as days since the first burst rather than seconds because there was a large gap between this february test data and the remaining data. This is just a hunch though, because the other encodings all seem the same and I'm not sure how to exactly verify it, especially since I've fixed the issue by going into the zarr conversion script and manually setting the encoding for A103.

jkingslake commented 1 year ago

OK< nice work. That explanation makes sense to me.

So, A101 and A104 was already in 'second since' format?

jkingslake commented 1 year ago

Were A101 and A104 already in 'second since' format? Or am I misunderstanding?

glugeorge commented 1 year ago

Yes, I believe that this was the case. A101 and A104 were by default in the 'seconds since' format

ldeo-glaciology / xapres

A103 xarray contains duplicate timestamps #13