Standard SVD fail due to memory allocation issue

pypest / pyemu

python modules for model-independent uncertainty analyses, data-worth analyses, and interfacing with PEST(++)

BSD 3-Clause "New" or "Revised" License

169 stars 94 forks source link

Standard SVD fail due to memory allocation issue #429

Closed Paszka1 closed 1 year ago

Paszka1 commented 1 year ago

I was trying to do a base SVD of a relatively large Jacobian matrix (97069 observations; mainly time series, and 782 parameters) to see how many initial super parameters I will have to count with during the first sequence of a sequential base and super parameter inversion. Unfortunately, I am unable to do that because pyemu (pyemu.la.LinearAnalysis.qhalf.s) cannot allocate the required memory for such a large array, so it fails. I wonder if pest++glm would still perform correctly, or will get into trouble when re-calculating the number of super parameters during the sequential base and super parameter iterations? I assume it will.

I have tried using SUPCALC too to get an idea about the number of singular values, but it also fails because of the long names in the control file. Nonetheless, it wouldn't solve the core problem.

Any suggestion or workaround will be highly appreciated.

mnfienen commented 1 year ago

Hey @Paszka1 - that is, indeed, a large matrix. I believe pestpp-glm should be able to handle it.

In any case, using the stability criteria in pestpp-glm is a good route to follow since based on current parameter values, an appropriate number of single vectors/super parameters can differ. So, I would give it a try and confirm pestpp-glm can handle it. If you are able to do it that way, the reporting of number of singular values used may be helpful after the fact.

However, if you want to calculate the SVD at the start, I would think you would want to operate on pyemu.la.LinearAnalysis.xtqx which has square dimensions as number of parameters and 782 should be well within memory.

Paszka1 commented 1 year ago

Hey Mike,

Thanks for your reply and suggestion to use the xtqx approach, which worked. I don't perfectly understand why does it need less memory than the qhalf approach, though.

I have yet another question not directly related to the above. I have generated an ensemble of parameters from a prior. I have one group of grid type parameters for which I haven't specified a geostructure during parameterization, because spatial correlation doesn't really make sense (i.e., they are independent pumping rates), and I haven't specified temporal correlation either. As a result, each realization in the ensemble was generated with the initial value of that parameter, which is of course logical. Is it still possible to draw an ensemble that considers variance for that parameter as well, or shall I specify a structure during parameterization?

mnfienen commented 1 year ago

Hi @Paszka1 -

on point 1, cool! Glad that worked. The reason it did is that multiplying XT x Q x X reduces the dimensionality of the matrix from being NOBS x NPAR to NPAR x NPAR. And anyway, it's really the singular vectors of xtqx that you are interested in.

on the second question, if you are generating your ensemble in PESTPP, it will use the bounds to approximate a variance around the initial parameter value for all the grid parameters. Should be similar in pyemu if you don't specify covariance. So, if they are showing up all at the starting value, maybe you have them set as fixed in the PST file?

In any case, when you intially sample the ensemble (whether externally through pyemu or your own code, or through PESTPP) that's the only time the covariance/geostructure will play a role, so it's worth making that robust at that point

Paszka1 commented 1 year ago

Hi @mnfienen,

Should've thought of that (ref. point one).

Point two. Indeed, I fixed those parameters and simply forgot.

I greatly appreciate your replies. They were very helpful notwithstanding that some extra reading and thinking on my end would've probably helped me figure out the answers.

mnfienen commented 1 year ago

No worries @Paszka1 - it's alot to take in!

I'm closing this issue. Cheers