Closed KevinShook closed 3 years ago
Thanks for bringing this to our attention Kevin,
Currently one of our focuses is attempting to prove that our code gives equivalent results to the previous version. We had noticed a 5-10% variation on certain outputs and are trying to determine what the causes are.
Could you please send me the project file you used to produce this output and the output files you produced?
Here is the model .prj file , the forcing files, and the output from the crhmcode release 1.1 model crhmcode11_water_balance.zip
This is the output from the same .prj file, using the same forcing .obs files Windows_CRHM_output.zip
Hi Kevin,
Could you run a md5 checksum on your windows crhm executable. I just want to be sure we are referring to the same version.
$ md5sum CRHM.exe
52023495a9367d9a987b81b40794f025 CRHM.exe
Interesting I have a version with the sum 3152984ec44968be74209c8e94fa9baa That I downloaded from the CRHM website.
I have another version with the sum e5a2e03ba9678c0158d54d673358e99b That I received from Peter.
Could share your CRHM.exe with me in some way?
There are quite a few versions of CRHM around. This is one of the newer ones - dated 07/02/20 (I think that means July 2, 2020). Here is this version. CRHM.zip
Peter's code is quite different. I am working on a project with others, so I am using the stock version of CRHM
Perfect thank you.
One think I have noticed from my initial look at this is that the basinflow output is quite off between the two outputs. I am unfamiliar with what this physically represents but could it be related?
The basinflow is the outflow from the basin in m3 in each time step. The discharge in the figure above is just the basinflow converted to an depth over the basin in mm, and summed for each year. The reason for doing this is to compare with other fluxes which are expressed as mm over the basin.
The first figure shows that the the discharge/basinflow is off by more than an order of magnitude in the crhmcode release 1.1. It also shows that the cause is not error in the precipitation (same in both models).
The second figure shows that the snow accumulation is the same in both models, so the error is not within the PBSM module, or any of the modules that it uses.
The first figure shows that the crhmcode release 1.1 is grossly underestimating evaporation, both from vegetation (HRUs 2,3,4) and from water (HRU 11). This is likely the cause of the overestimation of basinflow.
On your advice I followed the calculation of evaporation in the evap_resist#1 module. The inputs of that module are observation Qsi (W/m^2), variable QsiS_Var (W/m^2)(slope) from Annandale, and Uses variable QsiA_Var (W/m^2)(horizontal) from Annandale.
Following that I focused on the Annandale module to see if it was producing correct results.
Upon getting the output for the Annandale module from HRUs 1, 2, 3, 4, and 11 with the windows code and our gcc code compared them.
I found that the variables QsiA_Var, and QsiD_Var had matching values for all HRUs
The variables hru_SunAct and QsiS_Var matched for HRU 11 but not for HRUs 1, 2, 3, or 4.
Digging into the code the value for hru_SunAct seems to be calculated by the following code do you see anything wrong with it?
WtoMJ_D expands to 86400/1E6
double Ratio;
double Temp = QsiDT*WtoMJ_D - QdfoD[hh]; // observed direct incoming - QdfoD[hh]
if(Temp <= 0.0 || QdflatD[hh] - QdfoD[hh] <= 0.0)
Ratio = 0.0;
else{
Ratio = Temp/(QdflatD[hh] - QdfoD[hh]); // incoming clear sky direct only
if(Ratio > 1.0)
Ratio = 1.0;
}
hru_SunAct[hh] = Ratio*SunMax[hh];
QsiS_Var is calculated with the following code. Does anything seem amiss with it?
if(nstep == 1 || Global::Freq == 1) { // beginning of every day
if(hru_tmax[hh] - hru_tmin[hh] < 0.00001)
TauAT = 0.0;
else
TauAT = krs[hh]*(1.0 + 2.7E-5*hru_elev[hh])*pow(hru_tmax[hh] - hru_tmin[hh], 0.5f);
QsiDT = QdroDext[hh]*TauAT*MJ_DtoW;
} // if
if(variation == VARIATION_ORG || variation == VARIATION_1){
if(QdflatE[hh] > 1.0){
QsiAT = QdflatE[hh]*TauAT;
QsiST = (Qdro[hh] + Qdfo[hh])*QsiDT/(MJ_DtoW)/QdflatD[hh]; // on slope simplified
}
else{
QsiAT = 0.0;
QsiST = 0.0;
}
}
Both of these sections of code are from the ClassAnnan.cpp file if you need to see them in context.
I can't find anything wrong with the code inherently but I do not know the full science behind the calculations.
Thanks for looking into this I did a diff of the Annandale module in the two versions and couldn't see any difference in the code, so I suspect that the differences are in the inputs.
Since QsiS_Var differs between the versions, but QsiA_Var, and QsiD_Var did not, the difference must be seen in something that is only found in QsiS_Var. It is also likely to be found in hru_SunAct. One clue is that "QsiS_Var matched for HRU 11 but not for HRUs 1, 2, 3, or 4." The difference between HRU 11 and the others is that HRU 11 is covered by water, so it is flat. So, the differences are probably related to how solar radiation is adjusted for the effects of slope.
I think that it would be a good idea to check the values of Qdro, Qdfo, and QdflatD to see if they are differing between the two models.
What did you use as the other side of your diff? I don't think have the source code for the windows version in question.
Sorry - I forgot that you don't have the source of that version. Here it is: CRHM070220_NewModules.cpp.zip
I checked the variables that you mentioned for HRU 1
Qdfo and QdfoD are not matching for the two versions.
QdflatD is matching in that HRU
The only time Qdfo is matching is on time steps where the value is zero in both versions.
It appears that the gcc version is always lower than the borland/windows version.
Very interesting.
I just had a quick look at the ClassGlobal
In the old CRHM, line 629 is
diffuse = diffuse*sqr(cos(hru_GSL[hh]/2.0*DEGtoRAD)); // on slope
in ClassGlobal.cpp line 291 is
diffuse = diffuse*sqr(cos(hru_GSL[hh]/2.0)); // on slope
This might explain it
That certainly would cause a difference I assume the implementation with *DEGtoRAD is correct?
Yes - the value of hru_GSL is in degrees, so it needs to be converted before all the trig functions are used.
That change caused the Qdfo and QdfoD to become within tolerance of the borland/windows output.
I will see if that brings more things into tolerance and give you an update.
Excellent. I've been doing some more diffing to see if there are some other potential issues. Most things look OK, but I found this one: In the original CRHM NewModules.cpp ClassfrozenAyers lines 14765-14769
// if(cumsnowinfil[hh] - cumsnowinfil_0[hh] > INF[hh]) {
// cummeltrunoff[hh] += (cumsnowinfil[hh] - cumsnowinfil_0[hh] - INF[hh]);
// cumsnowinfil[hh] = (cumsnowinfil_0[hh] + INF[hh]);
// infiltype[hh] = RESTRICTED;
// } 03/30/2020 commented outas not suitable for multiple year simulation
In ClassFrozenAyers.cpp lines 263-267
if(cumsnowinfil[hh] - cumsnowinfil_0[hh] > INF[hh]) {
cummeltrunoff[hh] += (cumsnowinfil[hh] - cumsnowinfil_0[hh] - INF[hh]);
cumsnowinfil[hh] += INF[hh];
infiltype[hh] = RESTRICTED;
}
There are 2 separate issues
cumsnowinfil[hh] = (cumsnowinfil_0[hh] + INF[hh])
is not the same as the line in ClassFrozenAyers.cpp
cumsnowinfil[hh] += INF[hh];
Is this windows/borland source code in a repository somewhere? I would like to take a look at it.
Unfortunately, no. These are all the files that I have. CRHM_070220.zip
It is interesting that you found this difference in frozen Ayers as it is another one of the area's I was finding incorrect output specifically the cumsonwinfil variable this could be the cause.
I'm sure it didn't help!
I found a bunch of tabs around line 83 in ClassBasin.cpp
In ClassCrack.cpp, line 162 is
infil_index(fallstat[hh]/100.0, SWE[hh], Xinfil[0][hh], Xinfil[1][hh]);
In the original code, line 3974
infil_index(fallstat[hh]/100.0, SWE[hh], Xinfil[0][hh], Xinfil[1][hh], infDays[hh]);
In ClassCrack.cpp, line 178 is
if(crackstat[hh] > 6)
In the original code, line 3990 is
if(crackstat[hh] > infDays[hh])
Found some another difference
In ClassEvap.cpp line 105
if(inhibit_evap[hh] || inhibit_evap_User[hh]){
The Windows CRHM line 3217 is
if(inhibit_evap[hh]){
Another update
In Windows line 16887
f4 = 5000/rcs[hh]; // 05/21/20 correcting f4 = 5000/50;
In ClassevapD_resist.cpp
f4 = 5000/50;
Missed this one in ClassCrack - now needed
void infil_index(float Theta, float SWE, float & Index, float & Pot, const float infDays) {
Pot=5*(1-Theta)*pow(SWE, 0.584f);
Index=Pot/SWE;
if (Index > 1.0) Index=1;
Pot=Pot/infDays;
}
By implementing just the change to the diffuse calculation it has brought the SWE outputs into tolerance. The basinflow is also much closer than before.
The sd values are almost in tolerance except for hru 11.
Here are my comparison summaries. My initial comparison used a large time period and my revised comparison used a much smaller time scale but I think the results still are indicative of a huge improvement.
I will continue to resolve the differences that you have found and see how close we can bring these two versions.
Good to see! It looks like the other issues are probably within the evaporation. Not only does it affect the actet values, but the sd and soil_moist will also be affected. As these values are storages, short-duration errors in the evaporation will lead to long duration errors in the storages, i.e. with more time steps being out of tolerance.
I have resolved all the issues listed here but will keep this issue open until we get the input to within tolerance of what we expect.
I'll give the new changes a try
You are free to give it a try but you will have to checkout my branch. I won't push the changes to master until we have verified their correctness.
I did checkout your branch (justin, right?). However, when I compiled and ran it, I got pretty much the same results as before. Either I didn't get the changes, or something else is wrong.
I double-checked to make sure that my local repo is up to date
$ git remote show origin
* remote origin
Fetch URL: git@github.com:srlabUsask/crhmcode.git
Push URL: git@github.com:srlabUsask/crhmcode.git
HEAD branch: master
Remote branches:
Banani tracked
justin tracked
manishankar tracked
master tracked
multigroup_fix tracked
Local branches configured for 'git pull':
justin merges with remote justin
master merges with remote master
Local refs configured for 'git push':
justin pushes to justin (up to date)
master pushes to master (up to date)
and that I am on the justin branch
$ git branch
error: cannot run most: No such file or directory
* justin
master
As I said, when I compile and run the branch, I get essentially the same results as before:
I did see improvement when just analyzing the Qdfo and QdfoD variables. After I implement a few more changes I hope the outputs will grow closer.
Good to know. Thanks
Actually, comparing the two sets of plots, there is an improvement. The new values of actual et are greater and the discharges are reduced. So, it looks like we are moving in the right direction.
I have run a test to see how much we have improved over our initial outputs and summarized my findings in this spreadsheet.
It appears to me that we are moving in the right direction for the most part. Oddly for soil_moist(11) and hru_actet(11) we have actually drifted further apart.
I am planing on investigating each module in this project to see if there are more source code differences.
Thanks very much. I'm digging into the results myself. The value of soil_moist is not that useful on its own as it is a storage term, and is therefore affected by all fluxes. The value of hru_actet is affected by the applied evaporation and the soil mositure as it is the evaporation which occurs as the applied evporation is decreased by the soil being unsaturated. You can see that the applied evaporation depth is consistently too low in the new version:
I'm looking into what might be causing this, starting with the solar radiation calculations.
There are still some issues with the interval solar radiation calculations: However, the differences largely even out over time, so cannot be the cause of the consistent underestimation of evaporation:
I checked the evaporation over short time periods, and you can see that the crhmcode version causes zero evporation, when it should be quite large:
Digging into Classevap_Resist.cpp, I found several discrepancies: Classevap_Resist line 202
p = soilproperties[soil_type[hh]][AIRENT]*
pow(soilproperties[soil_type[hh]][PORE]/Soil_Moist, soilproperties[soil_type[hh]][PORESZ]);
Borland NewModules.cpp line 16617
p = soilproperties[soil_type[hh]][AIRENT]*
pow(1.0/Soil_Moist, soilproperties[soil_type[hh]][PORESZ]); // 05/21/20 correcting pow(soilproperties[soil_type[hh]][PORE]/Soil_Moist, soilproperties[soil_type[hh]][PORESZ]);
Classevap_Resist line254
f4 = 5000/50;
Borland NewModules.cpp line 16668
f4 = 5000/rcs[hh]; // 05/21/20 correcting f4 = 5000/50;
I added the changes in Classevap_Resist but after running a simulation for 2 years I did not find a significant improvement. I will continue looking for differences in the modules.
Thanks. I'll try running for longer.
I also found no real change in the output. The evaporation is still zero in the new version, when it should not be. I looked at the forcings, and there was no obvious cause of the zero values. The non-zero evaporation values look to be good. I think we'll have to debug it to find what is going on in some intervals
I found no unaccounted for differences between the modules involved in this project. I will investigate the evaporation modules more closely to see If I can find something that doesn't make sense.
I agree - I found no differences either. However, I noticed a) that the days where there were discrepancies were where there was precipitation and b) that the precipitation was usually zero on those days - not just different. You can see the effects in this figure.
It doesn't look like evap_Resist checks the precipitation - could this be done in another module, i.e. disabling evaporation where p >0?
I will take a look for that.
I stepped through the code in VS's debugger and it appears that evap_Resist is "turned off" for all hru's except for 0, 5, and 56. On every other hru inhibit_evap[hh] (line 157 Classevap_Resist.cpp) is set causing the calculation to be skipped. Is this intentional?
That doesn't seem right. I had wondered if it was being set incorrectly. According to the documentation
inhibit_evap (flag) - 0/1 enable/inhibit.
So if it's set to zero, the evap should occur as usual. Looking at the .prj file I see that all of the values are zero.
Shared inhibit_evap <0 to 1>
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
So, it looks like something is erroneously setting the flag
Alright I will see if I can track that down.
Should the values in this array be changing at all throughout the run or should it always be what they are set as initially?
These are control flag parameters so should not be changing once they are read in from the .prj file.
I was able to run release 1.1. The execution time was excellent - about 1/3 of the old Windows version. The executable file is also very small, which is great to see Unfortunately, there are very large errors in some of the values. I had noticed similar issues in release 1.0, as well.
As you can see from the figure, the annual depth of annual discharge is orders of magnitude too large in the crhmcode release 1.1 model. This would appear to be because the actual evaporation depths are too small, as the depths of precipitation are identical. This model uses 2 different methods for calculating evaporation. Penman-Monteith is used by HRUs 2-4, while HRU 11 uses Priestley-Taylor. Both are giving incorrect results. I checked the snow accumulation from the module PBSM, and it appears to be giving the same results in both models.
Therefore, the issue appears to be either in the module evap_Resist or within one of the modules that it uses.