jonghyunharrylee / pyPCGA

pyPCGA: fast and scalable inverse modeling approach
BSD 3-Clause "New" or "Revised" License
23 stars 15 forks source link

[ENH] Get reproducible results with PCGA (set the initial vector used by scipy.sparse.linalg.eigsh ...) #18

Open antoinecollet5 opened 1 year ago

antoinecollet5 commented 1 year ago

Up to now, when running pyPCGA twice or more with the same parameters, the results are slightly different each time.

This is what I attribute to the covariance matrix low-rank approximation relying on scipy.sparse.linalg.eigsh beauce the initial vector v0 is not provided and consequently chosen randomly.

The solution would be to let the possibility for the user set a seed (aka random_state) to generate a reproducible v0.

See: https://stackoverflow.com/a/52403508

jonghyunharrylee commented 1 year ago

Thank you Antoine and you are right that low-rank approx will not give users unique vectors. I guess I implemented it with oversampling parameters (let's say the number of eigenvectors computed to k + p where p is an oversampling parameter so that later we keep only "k" eigenmodes - this technique commonly used in randomized low-rank approximation) so that users expect less variability in results but not very sure. User-specified random seed would be a great option for reproducible results. I will take a look at it and will merge your PR. Happy holidays!

Best, Harry

antoinecollet5 commented 1 year ago

Hi Harry,

Happy new year and best wishes for 2023.

I just corrected a last bug in the changes I've made this morning. I tested and everything seems to work fine now.

Cheers Antoine

antoinecollet5 commented 1 year ago

Hi @jonghyunharrylee,

Any chance you will have time to look at it ?

Best regards

Antoine