powellb / seapy

State Estimation and Analysis in Python
MIT License
28 stars 21 forks source link

Question about observation netcdf file #66

Closed ymamoutos closed 2 years ago

ymamoutos commented 2 years ago

Greetings

Recently our group switched from matlab to python using seapy package for the pre-process of the observations that are need for ROMS 4DVAR scheme (formerly PSAS now RBL4DVAR) . I am experiencing a weird issue with the python processed observation file when i use it: During the initialization of the first inner loop (outer = 001, inner =001 not 000) although obs are not rejected ROMS doesn't read their values for the first 12 hours and this - let's say - gap ends (obviously) to a blowup. Using matlab processed file with the same datasets for observations model runs smoothly. obs_error and obs_variance differ between the two files - I am setting fixed error values for the state variables in the matlab scripts - but also the number of observations is different, with the seapy produced file having more compared to matlab's.

My experience with python is limited (now I am learning) and I am trying to understand why this is happening but I had no luck. Any help or insight will be highly appreciated.

I also am using the matlab scripts from https://www.myroms.org/svn/src/matlab to create the observation file.

Kind Regards

Giannis Mamoutos

powellb commented 2 years ago

I don't quite understand your description. You are saying that when you run the model, in the first non-linear model run (when it samples the model at the observation locations and times), it ignores the observations for the first 12 hours? The nonlinear model does not care about the obs_values or obs_error, per se, it just samples the variable at the proper time and location. So, there is nothing that should lead to a blowup.

One thing that I have seen happen is that you must pay careful attention to obs_time. The difference between obs_time for observations must be greater than DT of your model. For instance, let's assume your DT=3600 (1 hour). Next, your obs_time is in days from some reference. So, in this example, your model starts on day 1200 and runs until day 1205. You want observations that span 1200 to 1205, but never had a difference between the prior of less than 1/24=0.042.

Imagine now you have observations of:

obs_time = [1200, 1200, 1201, 1201, 1201, 1204, 1205]

This is fine, but if you have:

obs_time = [1200, 1200, 1200.03, 1201, 1201, 1204, 1205]

it will fail because ROMS will load the obs at t=1200, then the model steps to time 1200.042, but it passed the 1200.03 obs, and then it is stuck, it will not load any additional obs after that point.

If that isn't the issue, you just need to examine the resulting obs.nc file. Are the times, locations, etc. different? Is there an issue converting lat/lon to i,j (it should be fine, but if you cross 0 or 180 longitude, or you consistent with positive or negative values)?

There is nothing that should lead to a blowup in the first NLM run of the model. If it blows up in the inner-loop, there can be an issue with the obs_value or obs_error. If those are completely unreasonable, there could be an issue.

Hope that helps.

On Apr 4, 2022, at 3:44 AM, ymamoutos @.***> wrote:

Greetings

Recently our group switched from matlab to python using seapy package for the pre-process of the observations that are need for ROMS 4DVAR scheme (formerly PSAS now RBL4DVAR) . I am experiencing a weird issue with the python processed observation file when i use it: During the initialization of the first inner loop (outer = 001, inner =001 not 000) although obs are not rejected ROMS doesn't read their values for the first 12 hours and this - let's say - gap ends (obviously) to a blowup. Using matlab processed file with the same datasets for observations model runs smoothly. obs_error and obs_variance differ between the two files - I am setting fixed error values for the state variables in the matlab scripts - but also the number of observations is different, with the seapy produced file having more compared to matlab's.

My experience with python is limited (now I am learning) and I am trying to understand why this is happening but I had no luck. Any help or insight will be highly appreciated.

I also am using the matlab scripts from https://www.myroms.org/svn/src/matlab https://www.myroms.org/svn/src/matlab to create the observation file.

Kind Regards

Giannis Mamoutos

— Reply to this email directly, view it on GitHub https://github.com/powellb/seapy/issues/66, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAANGP7DEDW4SZ3ZJAAESGTVDLW2BANCNFSM5SPSWCAA. You are receiving this because you are subscribed to this thread.

ymamoutos commented 2 years ago

Greetings

Initially many thanks for your answer. After checking the observation files the differences are in survey_time variable and now I am confused: My model's DT is 240 sec and when I am using seapy.roms.obsgen.aviso_sla_track i am setting dt = (240./(24.*3600.)). For example for March 25 2022 the survey_time values are

19664.1436001211, 19664.2791167416, 19664.3537274272, 19664.672197579

Interpreted to

'25-Mar-2022 03:26:47' '25-Mar-2022 06:41:55' '25-Mar-2022 08:29:22' '25-Mar-2022 16:07:57'

The error for the observations is the default as provided from seapy for each dataset:

i.e for sla tracks I am using

altika = seapy.roms.obsgen.aviso_sla_track(mygrid ,dt,reftime=datetime.datetime(1968,5,23,0,0,0), ssh_error=seapy.roms.obsgen._aviso_sla_errors['SSH_AVISO_ALTIKA'], ssh_mean=zam,repeat=0,provenance='SSH_AVISO_ALTIKA');

For SLA obs the error is between 0.0025 and 0.005 meters. For SST, I am using OSTIA L4 observations, the error from 0.4 C to 1.4 C. Concerning the obs_values I didn't find any unreasonable values.

I have tried with bigger dt (360 sec, 480 sec) but no luck

Any other suggestion will be appreciated

Many thanks again

Giannis

powellb commented 2 years ago

So, you process sla, and you have one file (e.g., obs_sla_track.nc). Then you process sst, and you have one file (e.g., obs_ostia.nc). Then, you merge those files with obs_merge and write out a new file (e.g., obs.nc)? You run the model with obs.nc and it "blows up"?

What if you just run with one of the files, such as sla or sst?

I am still trying to understand how it could cause ROMS to "blow up". That doesn't make sense to me. A blow up is when the model's time-stepping solution goes unstable (e.g., velocities of 30 m s**-1, etc.). The observation file is only used for calculating adjustments and if the values are proper, you would not have unstable increments.

Cheers, Brian

On Apr 7, 2022, at 1:29 PM, ymamoutos @.***> wrote:

Greetings

Initially many thanks for your answer. After checking the observation files the differences are in survey_time variable and now I am confused: My model's DT is 240 sec and when I am using seapy.roms.obsgen.aviso_sla_track i am setting dt = (240./(24.*3600.)). For example for March 25 2022 the survey_time values are

19664.1436001211, 19664.2791167416, 19664.3537274272, 19664.672197579

Interpreted to

'25-Mar-2022 03:26:47' '25-Mar-2022 06:41:55' '25-Mar-2022 08:29:22' '25-Mar-2022 16:07:57'

The error for the observations is the default as provided from seapy for each dataset:

i.e for sla tracks I am using

altika = seapy.roms.obsgen.aviso_sla_track(mygrid ,dt,reftime=datetime.datetime(1968,5,23,0,0,0), ssh_error=seapy.roms.obsgen._aviso_sla_errors['SSH_AVISO_ALTIKA'], ssh_mean=zam,repeat=0,provenance='SSH_AVISO_ALTIKA');

For SLA obs the error is between 0.0025 and 0.005 meters. For SST, I am using OSTIA L4 observations, the error from 0.4 C to 1.4 C. Concerning the obs_values I didn't find any unreasonable values.

I have tried with bigger dt (360 sec, 480 sec) but no luck

Any other suggestion will be appreciated

Many thanks again

Gianniw

— Reply to this email directly, view it on GitHub https://github.com/powellb/seapy/issues/66#issuecomment-1092299000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAANGP5AIKGTTTGDUCIT37TVD5VWRANCNFSM5SPSWCAA. You are receiving this because you commented.

ivicajan commented 2 years ago

An easy way to check would be to run VERIFICATION option, that will sample NLM at the obs time/space loations Cheers,

Ivica


From: Brian Powell @.> Sent: Saturday, April 9, 2022 2:53:12 AM To: powellb/seapy @.> Cc: Subscribed @.***> Subject: Re: [powellb/seapy] Question about observation netcdf file (Issue #66)

So, you process sla, and you have one file (e.g., obs_sla_track.nc). Then you process sst, and you have one file (e.g., obs_ostia.nc). Then, you merge those files with obs_merge and write out a new file (e.g., obs.nc)? You run the model with obs.nc and it "blows up"?

What if you just run with one of the files, such as sla or sst?

I am still trying to understand how it could cause ROMS to "blow up". That doesn't make sense to me. A blow up is when the model's time-stepping solution goes unstable (e.g., velocities of 30 m s**-1, etc.). The observation file is only used for calculating adjustments and if the values are proper, you would not have unstable increments.

Cheers, Brian

On Apr 7, 2022, at 1:29 PM, ymamoutos @.***> wrote:

Greetings

Initially many thanks for your answer. After checking the observation files the differences are in survey_time variable and now I am confused: My model's DT is 240 sec and when I am using seapy.roms.obsgen.aviso_sla_track i am setting dt = (240./(24.*3600.)). For example for March 25 2022 the survey_time values are

19664.1436001211, 19664.2791167416, 19664.3537274272, 19664.672197579

Interpreted to

'25-Mar-2022 03:26:47' '25-Mar-2022 06:41:55' '25-Mar-2022 08:29:22' '25-Mar-2022 16:07:57'

The error for the observations is the default as provided from seapy for each dataset:

i.e for sla tracks I am using

altika = seapy.roms.obsgen.aviso_sla_track(mygrid ,dt,reftime=datetime.datetime(1968,5,23,0,0,0), ssh_error=seapy.roms.obsgen._aviso_sla_errors['SSH_AVISO_ALTIKA'], ssh_mean=zam,repeat=0,provenance='SSH_AVISO_ALTIKA');

For SLA obs the error is between 0.0025 and 0.005 meters. For SST, I am using OSTIA L4 observations, the error from 0.4 C to 1.4 C. Concerning the obs_values I didn't find any unreasonable values.

I have tried with bigger dt (360 sec, 480 sec) but no luck

Any other suggestion will be appreciated

Many thanks again

Gianniw

— Reply to this email directly, view it on GitHub https://github.com/powellb/seapy/issues/66#issuecomment-1092299000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAANGP5AIKGTTTGDUCIT37TVD5VWRANCNFSM5SPSWCAA. You are receiving this because you commented.

— Reply to this email directly, view it on GitHubhttps://github.com/powellb/seapy/issues/66#issuecomment-1093235685, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEN4TMFZPQKJ7XFWQTPWPP3VEB6BRANCNFSM5SPSWCAA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

ymamoutos commented 2 years ago

Dear Brian and Ivica

Many thanks for your answers. After several test runs using each observation dataset alone and combined I found that the problem occurs when I am using T/S ARGO profiles. To make it clear: if the observation file has only SLA and SST either individually of combined the model runs smoothly but if the T/S ARGO data is included then blows up (I am attaching the log to see what is happening). So I tried to run with different error range than the default for ARGO temperature and salinity but the problem persists. Here I must clarify that the ARGO data I am using with matlab scripts are from CMEMS service and for python from ifremer's ftp in order to use directly the seapy package. I am still searching it and trying to modify / rewrite the seapy ctd_argo funtion to read the same data.

Again many thanks for your answers

Kind Regards

Giannis an_220405.txt

powellb commented 2 years ago

Your log shows a number of issues

  1. You have rejected observations every time that they load. You have BGQC enabled, so there are two possible ways that they are rejected: i) the value is extreme (set in the s4dvar.in); or, ii) the location is not correct (on land, too deep, too shallow, etc.). Basically, if you process and analyze the obs, there should not be any reason that they are rejected.

Further, the strangest item is that for your Argo, you must have coincident temp and salt observations. As such, why are some of them rejected (e.g., salt) but not the other (temp). This suggests that something is wrong with the values.

We use the argo routine to make observations extensively but never have had that occur. I have to admit that we don't use BGQC, so it may be that there are bad salt values in the argo profile that you processed.

  1. Your run is in the third inner loop, but the minimization is failing. Notice that you have NaN values for the Lanczos vectors. This means that in the next inner-loop (in this case the third), it is going to go NaN and blow up.

It seems that there is indeed something amiss with your observations (the SST and SSH are wrong also if there are rejected), and it manifests as unable to determine the "downhill" direction and then blows up.

My guess is that BGQC is rejected obs, but the question is why are the values so terrible that they are being rejected (so it is doing its job properly)? You should process a single day of SST, a single day of SSH, and a single profile of Argo, and examine the obs carefully.

As an example, I processed SST and Argo without issue (over 3 years of data) via:

###################################################

OPTIONS

grid = seapy.model.asgrid('../grid/fleat_outer_grid.nc') dt = 10 / 1440

###################################################

SST

obsdir = "obs/sst_navomap/global/" files = seapy.list_files(obsdir, regex='201[456]') # Raw observation files outfiles = 'sst-navo/obs-navo-#.nc' # Output file name factory = seapy.roms.obsgen.navo_sst_map(grid, dt) # Define the object factory.batch_files(files, outfiles) # Process files

###################################################

Argo

obsdir = "obs/ctd_argo/" # Raw observation file directory files = seapy.list_files(obsdir, regex='201[456]') # Raw observation files outfiles = 'ctd-argo/obs-argo-#.nc' factory = seapy.roms.obsgen.argo_ctd(grid, dt) # Define the object factory.batch_files(files, outfiles, clobber=False) # Process files

Cheers, Brian

On Apr 9, 2022, at 12:06 PM, ymamoutos @.***> wrote:

Dear Brian and Ivica

Many thanks for your answers. After several test runs using each observation dataset alone and combined I found that the problem occurs when I am using T/S ARGO profiles. To make it clear: if the observation file has only SLA and SST either individually of combined the model runs smoothly but if the T/S ARGO data is included then blows up (I am attaching the log to see what is happening). So I tried to run with different error range than the default for ARGO temperature and salinity but the problem persists. Here I must clarify that the ARGO data I am using with matlab scripts are from CMEMS service and for python from ifremer's ftp in order to use directly the seapy package. I am still searching it and trying to modify / rewrite the seapy ctd_argo funtion to read the same data.

Again many thanks for your answers

Kind Regards

Giannis an_220405.txt https://github.com/powellb/seapy/files/8457826/an_220405.txt — Reply to this email directly, view it on GitHub https://github.com/powellb/seapy/issues/66#issuecomment-1094131739, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAANGPZGPL2E7TCWHVPBR33VEH5O3ANCNFSM5SPSWCAA. You are receiving this because you commented.

ymamoutos commented 2 years ago

Dear Brian

Many thanks once more for your answer and comments. It was again very helpful. I followed your suggestion running just one day window analysis using the dt (dt = 10/1440) you proposed only for ARGO data. The obs file values and error seems normal having nothing dubious. Model, fortunately, run smoothly with no errors (having enabled and disabled BGQC option) and now I am running the typical 3day window analysis for our's group forecast. I am still searching and trying to understand why this happened but until now I don't have a solid answer...

I am not sure if I must close this issue/thread or leave it open.

Kind Regards and again many thanks for your help.

Giannis Mamoutos

powellb commented 2 years ago

Not having heard any more on this, I assume that you have worked through the issue and all is fine.