Open Baboon61 opened 4 years ago
This is the first time I hear about this. We can try to figure out what is going on.
From your last post on biostars it looks like the model_pars matrices are already different (this comes before ksmooth is called). It also looks like only the theta column is different, which suggests that MASS::theta.ml is giving different results. That function has as a default epsilon parameter eps = .Machine$double.eps^0.25
- perhaps that leads to a different number of iterations between machines and as a result you get slightly different theta estimates.
The only solutions I can think of right now require modifications to sctransform (hard-code eps or make it a user-exposed parameter)
I cannot find the MASS::theta.ml
function you are talking about in the get_model_pars
function or before in the vst
function
In that one maybe :
if (method == 'nb_theta_given') {
theta_given_bin_worker <- theta_given_bin[indices]
return(fit_nb_theta_given(umi = umi_bin_worker, model_str = model_str, data = data_step1, theta_given = theta_given_bin_worker))
}
Hi, I'm testing reproducibility across platforms too, and we also see this inconsistency across platforms.
I tested a transform on BIOWULF, FRCE, an AWS instance with Ubuntu 20.02, and on RStudio Workbench. Interestingly the results from BIOWULF and FRCE are identical, but different from RSW and AWS. RSW and AWS results have same TSNE plot, but different UMAP. We also observe some sort of 180 degree rotation of UMAP plot between cluster results and others.
I checked the ".Machine$double.eps^0.25" just now, and RSW and AWS both reports 0.0001220703, so this may not be the case for me.
Can you let me know if there are other potential causes? Thanks.
Hi, due to Covid19 situation I am working part time at the lab and at home. At the lab I am working under an HP workstation linux environment and at home on MSI with windows 10 and Rstudio.
As input to the SCTransform function I use the same RDS object, same parameters, same R version, same Seurat and SCTransform versions, but I get slightly different residual (mean and variance) results after a SCTransform which lead to different UMAPs shapes.
As I detailed it in this biostars post, my hint is that this unconsistancy comes from how each machine is handling digits. Do you have more information on how SCTransform is dealing with a long list of digits ? As I mentioned in my post I was not able to get into the
C_ksmooth
function to continue my tracking as the function calls a C feature I have no access to.Thanks a lot !