srlabUsask / crhmcode

GNU General Public License v3.0
8 stars 4 forks source link

Problem with results from release 1.1 #10

Closed KevinShook closed 3 years ago

KevinShook commented 3 years ago

I was able to run release 1.1. The execution time was excellent - about 1/3 of the old Windows version. The executable file is also very small, which is great to see Unfortunately, there are very large errors in some of the values. I had noticed similar issues in release 1.0, as well.

As you can see from the figure, the annual depth of annual discharge is orders of magnitude too large in the crhmcode release 1.1 model. This would appear to be because the actual evaporation depths are too small, as the depths of precipitation are identical. windows_crhmcode_annual_flux_depths This model uses 2 different methods for calculating evaporation. Penman-Monteith is used by HRUs 2-4, while HRU 11 uses Priestley-Taylor. Both are giving incorrect results. I checked the snow accumulation from the module PBSM, and it appears to be giving the same results in both models. windows_crhmcode_annual_peak_swe

Therefore, the issue appears to be either in the module evap_Resist or within one of the modules that it uses.

jhs507 commented 3 years ago

Thanks for bringing this to our attention Kevin,

Currently one of our focuses is attempting to prove that our code gives equivalent results to the previous version. We had noticed a 5-10% variation on certain outputs and are trying to determine what the causes are.

Could you please send me the project file you used to produce this output and the output files you produced?

KevinShook commented 3 years ago

Here is the model .prj file , the forcing files, and the output from the crhmcode release 1.1 model crhmcode11_water_balance.zip

This is the output from the same .prj file, using the same forcing .obs files Windows_CRHM_output.zip

jhs507 commented 3 years ago

Hi Kevin,

Could you run a md5 checksum on your windows crhm executable. I just want to be sure we are referring to the same version.

KevinShook commented 3 years ago
$ md5sum CRHM.exe
52023495a9367d9a987b81b40794f025  CRHM.exe
jhs507 commented 3 years ago

Interesting I have a version with the sum 3152984ec44968be74209c8e94fa9baa That I downloaded from the CRHM website.

I have another version with the sum e5a2e03ba9678c0158d54d673358e99b That I received from Peter.

Could share your CRHM.exe with me in some way?

KevinShook commented 3 years ago

There are quite a few versions of CRHM around. This is one of the newer ones - dated 07/02/20 (I think that means July 2, 2020). Here is this version. CRHM.zip

KevinShook commented 3 years ago

Peter's code is quite different. I am working on a project with others, so I am using the stock version of CRHM

jhs507 commented 3 years ago

Perfect thank you.

jhs507 commented 3 years ago

One think I have noticed from my initial look at this is that the basinflow output is quite off between the two outputs. I am unfamiliar with what this physically represents but could it be related?

KevinShook commented 3 years ago

The basinflow is the outflow from the basin in m3 in each time step. The discharge in the figure above is just the basinflow converted to an depth over the basin in mm, and summed for each year. The reason for doing this is to compare with other fluxes which are expressed as mm over the basin.

The first figure shows that the the discharge/basinflow is off by more than an order of magnitude in the crhmcode release 1.1. It also shows that the cause is not error in the precipitation (same in both models).

The second figure shows that the snow accumulation is the same in both models, so the error is not within the PBSM module, or any of the modules that it uses.

The first figure shows that the crhmcode release 1.1 is grossly underestimating evaporation, both from vegetation (HRUs 2,3,4) and from water (HRU 11). This is likely the cause of the overestimation of basinflow.

jhs507 commented 3 years ago

On your advice I followed the calculation of evaporation in the evap_resist#1 module. The inputs of that module are observation Qsi (W/m^2), variable QsiS_Var (W/m^2)(slope) from Annandale, and Uses variable QsiA_Var (W/m^2)(horizontal) from Annandale.

Following that I focused on the Annandale module to see if it was producing correct results.

Upon getting the output for the Annandale module from HRUs 1, 2, 3, 4, and 11 with the windows code and our gcc code compared them.

I found that the variables QsiA_Var, and QsiD_Var had matching values for all HRUs

The variables hru_SunAct and QsiS_Var matched for HRU 11 but not for HRUs 1, 2, 3, or 4.

Digging into the code the value for hru_SunAct seems to be calculated by the following code do you see anything wrong with it?

WtoMJ_D expands to 86400/1E6

    double Ratio;
    double Temp =  QsiDT*WtoMJ_D - QdfoD[hh]; // observed direct incoming  - QdfoD[hh]
    if(Temp <= 0.0 || QdflatD[hh] - QdfoD[hh] <= 0.0)
      Ratio = 0.0;
    else{
      Ratio = Temp/(QdflatD[hh] - QdfoD[hh]); // incoming clear sky direct only
      if(Ratio > 1.0)
        Ratio = 1.0;
    }
    hru_SunAct[hh] = Ratio*SunMax[hh];

QsiS_Var is calculated with the following code. Does anything seem amiss with it?

if(nstep == 1 || Global::Freq == 1) { // beginning of every day

      if(hru_tmax[hh] - hru_tmin[hh] < 0.00001)
        TauAT = 0.0;
      else
        TauAT = krs[hh]*(1.0 + 2.7E-5*hru_elev[hh])*pow(hru_tmax[hh] - hru_tmin[hh], 0.5f);

      QsiDT = QdroDext[hh]*TauAT*MJ_DtoW;

    } // if

    if(variation == VARIATION_ORG || variation == VARIATION_1){
      if(QdflatE[hh] > 1.0){
        QsiAT = QdflatE[hh]*TauAT;
        QsiST = (Qdro[hh] + Qdfo[hh])*QsiDT/(MJ_DtoW)/QdflatD[hh]; // on slope simplified
      }
      else{
        QsiAT = 0.0;
        QsiST = 0.0;
      }
    }

Both of these sections of code are from the ClassAnnan.cpp file if you need to see them in context.

I can't find anything wrong with the code inherently but I do not know the full science behind the calculations.

KevinShook commented 3 years ago

Thanks for looking into this I did a diff of the Annandale module in the two versions and couldn't see any difference in the code, so I suspect that the differences are in the inputs.

Since QsiS_Var differs between the versions, but QsiA_Var, and QsiD_Var did not, the difference must be seen in something that is only found in QsiS_Var. It is also likely to be found in hru_SunAct. One clue is that "QsiS_Var matched for HRU 11 but not for HRUs 1, 2, 3, or 4." The difference between HRU 11 and the others is that HRU 11 is covered by water, so it is flat. So, the differences are probably related to how solar radiation is adjusted for the effects of slope.

I think that it would be a good idea to check the values of Qdro, Qdfo, and QdflatD to see if they are differing between the two models.

jhs507 commented 3 years ago

What did you use as the other side of your diff? I don't think have the source code for the windows version in question.

KevinShook commented 3 years ago

Sorry - I forgot that you don't have the source of that version. Here it is: CRHM070220_NewModules.cpp.zip

jhs507 commented 3 years ago

I checked the variables that you mentioned for HRU 1

Qdfo and QdfoD are not matching for the two versions.

QdflatD is matching in that HRU

The only time Qdfo is matching is on time steps where the value is zero in both versions.

It appears that the gcc version is always lower than the borland/windows version.

KevinShook commented 3 years ago

Very interesting. I just had a quick look at the ClassGlobal In the old CRHM, line 629 is diffuse = diffuse*sqr(cos(hru_GSL[hh]/2.0*DEGtoRAD)); // on slope

in ClassGlobal.cpp line 291 is diffuse = diffuse*sqr(cos(hru_GSL[hh]/2.0)); // on slope

This might explain it

jhs507 commented 3 years ago

That certainly would cause a difference I assume the implementation with *DEGtoRAD is correct?

KevinShook commented 3 years ago

Yes - the value of hru_GSL is in degrees, so it needs to be converted before all the trig functions are used.

jhs507 commented 3 years ago

That change caused the Qdfo and QdfoD to become within tolerance of the borland/windows output.

I will see if that brings more things into tolerance and give you an update.

KevinShook commented 3 years ago

Excellent. I've been doing some more diffing to see if there are some other potential issues. Most things look OK, but I found this one: In the original CRHM NewModules.cpp ClassfrozenAyers lines 14765-14769

//              if(cumsnowinfil[hh] - cumsnowinfil_0[hh] > INF[hh]) {
//                cummeltrunoff[hh] += (cumsnowinfil[hh] - cumsnowinfil_0[hh] - INF[hh]);
//                cumsnowinfil[hh] = (cumsnowinfil_0[hh] + INF[hh]);
//                infiltype[hh] = RESTRICTED;
//              } 03/30/2020 commented outas not suitable for multiple year simulation

In ClassFrozenAyers.cpp lines 263-267

              if(cumsnowinfil[hh] - cumsnowinfil_0[hh] > INF[hh]) {
                  cummeltrunoff[hh] += (cumsnowinfil[hh] - cumsnowinfil_0[hh] - INF[hh]);
                  cumsnowinfil[hh] += INF[hh];
                  infiltype[hh] = RESTRICTED;
              }

There are 2 separate issues

  1. In the original CRHM code the lines are commented out. This looks to be a recent addition, so I'm not surprised that you don't have it, and
  2. the original line cumsnowinfil[hh] = (cumsnowinfil_0[hh] + INF[hh]) is not the same as the line in ClassFrozenAyers.cpp cumsnowinfil[hh] += INF[hh];
jhs507 commented 3 years ago

Is this windows/borland source code in a repository somewhere? I would like to take a look at it.

KevinShook commented 3 years ago

Unfortunately, no. These are all the files that I have. CRHM_070220.zip

jhs507 commented 3 years ago

It is interesting that you found this difference in frozen Ayers as it is another one of the area's I was finding incorrect output specifically the cumsonwinfil variable this could be the cause.

KevinShook commented 3 years ago

I'm sure it didn't help! I found a bunch of tabs around line 83 in ClassBasin.cpp In ClassCrack.cpp, line 162 is infil_index(fallstat[hh]/100.0, SWE[hh], Xinfil[0][hh], Xinfil[1][hh]);

In the original code, line 3974 infil_index(fallstat[hh]/100.0, SWE[hh], Xinfil[0][hh], Xinfil[1][hh], infDays[hh]);

In ClassCrack.cpp, line 178 is if(crackstat[hh] > 6)

In the original code, line 3990 is if(crackstat[hh] > infDays[hh])

KevinShook commented 3 years ago

Found some another difference In ClassEvap.cpp line 105 if(inhibit_evap[hh] || inhibit_evap_User[hh]){

The Windows CRHM line 3217 is if(inhibit_evap[hh]){

KevinShook commented 3 years ago

Another update In Windows line 16887 f4 = 5000/rcs[hh]; // 05/21/20 correcting f4 = 5000/50; In ClassevapD_resist.cpp f4 = 5000/50;

KevinShook commented 3 years ago

Missed this one in ClassCrack - now needed

void infil_index(float Theta, float SWE, float & Index, float & Pot, const float infDays) {

  Pot=5*(1-Theta)*pow(SWE, 0.584f);
  Index=Pot/SWE;
  if (Index > 1.0) Index=1;
  Pot=Pot/infDays;
}
jhs507 commented 3 years ago

By implementing just the change to the diffuse calculation it has brought the SWE outputs into tolerance. The basinflow is also much closer than before.

The sd values are almost in tolerance except for hru 11.

Here are my comparison summaries. My initial comparison used a large time period and my revised comparison used a much smaller time scale but I think the results still are indicative of a huge improvement.

comparison_inital.xlsx

comparison_revised.xlsx

I will continue to resolve the differences that you have found and see how close we can bring these two versions.

KevinShook commented 3 years ago

Good to see! It looks like the other issues are probably within the evaporation. Not only does it affect the actet values, but the sd and soil_moist will also be affected. As these values are storages, short-duration errors in the evaporation will lead to long duration errors in the storages, i.e. with more time steps being out of tolerance.

jhs507 commented 3 years ago

I have resolved all the issues listed here but will keep this issue open until we get the input to within tolerance of what we expect.

KevinShook commented 3 years ago

I'll give the new changes a try

jhs507 commented 3 years ago

You are free to give it a try but you will have to checkout my branch. I won't push the changes to master until we have verified their correctness.

KevinShook commented 3 years ago

I did checkout your branch (justin, right?). However, when I compiled and ran it, I got pretty much the same results as before. Either I didn't get the changes, or something else is wrong.

KevinShook commented 3 years ago

I double-checked to make sure that my local repo is up to date

 $ git remote show origin
* remote origin
  Fetch URL: git@github.com:srlabUsask/crhmcode.git
  Push  URL: git@github.com:srlabUsask/crhmcode.git
  HEAD branch: master
  Remote branches:
    Banani         tracked
    justin         tracked
    manishankar    tracked
    master         tracked
    multigroup_fix tracked
  Local branches configured for 'git pull':
    justin merges with remote justin
    master merges with remote master
  Local refs configured for 'git push':
    justin pushes to justin (up to date)
    master pushes to master (up to date)

and that I am on the justin branch

 $ git branch
error: cannot run most: No such file or directory
* justin
  master

As I said, when I compile and run the branch, I get essentially the same results as before:

windows_crhmcode_annual_flux_depths

jhs507 commented 3 years ago

I did see improvement when just analyzing the Qdfo and QdfoD variables. After I implement a few more changes I hope the outputs will grow closer.

KevinShook commented 3 years ago

Good to know. Thanks

KevinShook commented 3 years ago

Actually, comparing the two sets of plots, there is an improvement. The new values of actual et are greater and the discharges are reduced. So, it looks like we are moving in the right direction.

jhs507 commented 3 years ago

I have run a test to see how much we have improved over our initial outputs and summarized my findings in this spreadsheet.

Improvement_data.xlsx

It appears to me that we are moving in the right direction for the most part. Oddly for soil_moist(11) and hru_actet(11) we have actually drifted further apart.

I am planing on investigating each module in this project to see if there are more source code differences.

KevinShook commented 3 years ago

Thanks very much. I'm digging into the results myself. The value of soil_moist is not that useful on its own as it is a storage term, and is therefore affected by all fluxes. The value of hru_actet is affected by the applied evaporation and the soil mositure as it is the evaporation which occurs as the applied evporation is decreased by the soil being unsaturated. You can see that the applied evaporation depth is consistently too low in the new version:

windows_crhmcode_annual_evap_depths

I'm looking into what might be causing this, starting with the solar radiation calculations.

KevinShook commented 3 years ago

There are still some issues with the interval solar radiation calculations: windows_crhmcode_interval_Qn_scatter However, the differences largely even out over time, so cannot be the cause of the consistent underestimation of evaporation: windows_crhmcode_annual_Qn_scatter

I checked the evaporation over short time periods, and you can see that the crhmcode version causes zero evporation, when it should be quite large: Borland_crhmcode_daily_HRU2_evap_1972

Digging into Classevap_Resist.cpp, I found several discrepancies: Classevap_Resist line 202

           p = soilproperties[soil_type[hh]][AIRENT]*
             pow(soilproperties[soil_type[hh]][PORE]/Soil_Moist, soilproperties[soil_type[hh]][PORESZ]);

Borland NewModules.cpp line 16617

          p = soilproperties[soil_type[hh]][AIRENT]*
             pow(1.0/Soil_Moist, soilproperties[soil_type[hh]][PORESZ]); // 05/21/20 correcting pow(soilproperties[soil_type[hh]][PORE]/Soil_Moist, soilproperties[soil_type[hh]][PORESZ]);

Classevap_Resist line254

            f4 = 5000/50;

Borland NewModules.cpp line 16668

          f4 = 5000/rcs[hh]; // 05/21/20 correcting f4 = 5000/50;
jhs507 commented 3 years ago

I added the changes in Classevap_Resist but after running a simulation for 2 years I did not find a significant improvement. I will continue looking for differences in the modules.

KevinShook commented 3 years ago

Thanks. I'll try running for longer.

KevinShook commented 3 years ago

I also found no real change in the output. The evaporation is still zero in the new version, when it should not be. I looked at the forcings, and there was no obvious cause of the zero values. The non-zero evaporation values look to be good. I think we'll have to debug it to find what is going on in some intervals

jhs507 commented 3 years ago

I found no unaccounted for differences between the modules involved in this project. I will investigate the evaporation modules more closely to see If I can find something that doesn't make sense.

KevinShook commented 3 years ago

I agree - I found no differences either. However, I noticed a) that the days where there were discrepancies were where there was precipitation and b) that the precipitation was usually zero on those days - not just different. You can see the effects in this figure.

HRU_11_evap_vs_precip

It doesn't look like evap_Resist checks the precipitation - could this be done in another module, i.e. disabling evaporation where p >0?

jhs507 commented 3 years ago

I will take a look for that.

I stepped through the code in VS's debugger and it appears that evap_Resist is "turned off" for all hru's except for 0, 5, and 56. On every other hru inhibit_evap[hh] (line 157 Classevap_Resist.cpp) is set causing the calculation to be skipped. Is this intentional?

KevinShook commented 3 years ago

That doesn't seem right. I had wondered if it was being set incorrectly. According to the documentation

inhibit_evap (flag) - 0/1 enable/inhibit.

So if it's set to zero, the evap should occur as usual. Looking at the .prj file I see that all of the values are zero.

Shared inhibit_evap <0 to 1>
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 

So, it looks like something is erroneously setting the flag

jhs507 commented 3 years ago

Alright I will see if I can track that down.

jhs507 commented 3 years ago

Should the values in this array be changing at all throughout the run or should it always be what they are set as initially?

KevinShook commented 3 years ago

These are control flag parameters so should not be changing once they are read in from the .prj file.