Open ashishjain1988 opened 4 years ago
You can download the simulated data @ https://github.com/ashishjain1988/machine-learning-examples/blob/master/simulated_data2_1.txt
Some of the estimates of the uniqueness occur at the boundary. I tried running from multiple starting values and all of the final estimates seemed to be very close (within 3-4 decimal places).
Also I tried estimating using EM algorithm using the cate package. The final solution from the EM had lower log-likelihood. In fact, the EM algorithm gives suboptimal answer because the first order conditions are not met:
BiocManager::install("cate")
h = cate::factor.analysis(Y=scale(d),r=4)
Sigma = tcrossprod(h$Gamma) + diag(h$Sigma)
Under the first order optimality, all the diagonals of Sigma must be 1 because scaled(d)
has been used, but you can check all entries of diag(Sigma)
are less than 1.
Conclusion: I'd think it's safe to ignore the warning message.
Do you find similar problem if you generate the noise from normal distribution instead of gamma?
Thank you for the help! I am getting the same warning if I use noise from a normal distribution for K=1 to 4.
Can you provide a little more insight on how the data was generated? Is it like below?
library(MixSim)
n = 100
p = 1125
q = 5 # number of factors
nclust = 4
overlap = 0.01
cluster.object <- MixGOM(goMega = overlap, K = nclust, p = q, sph = T)
z <- simdataset(n = n,Pi = cluster.object$Pi,
Mu = cluster.object$Mu, S = cluster.object$S)$X
L <- matrix(rnorm(p*q,2,2),p, q)
Y <- tcrossprod(z,L) + matrix(rexp(n*p,rate = 1) - 1, n, p)
In our case, we are simulating the data following some assumptions that we made in our tool. Below is the code to generate the dataset.
library(MixSim)
##MixSim Simulations
#Q <- MixSim(MaxOmega = 0.20, BarOmega = 0.05, K = 4, p = 10,hom = TRUE,sph = TRUE)
L<-4 #Clusters
K<-5 #Factors
N<-100 #Samples
Q <- MixGOM(goMega = 0.01, K = L-1, p = K, sph = TRUE,hom = TRUE)
#Rescale Mu by dividing with Sigma values
rescaledNu<-Q$Mu
for(i in c(1:(L-1)))
{
rescaledNu[i,] <- Q$Mu[i,]/diag(sqrt(Q$S[,,i]))
}
#Rescaling Pi
ProbablityOfLastCluster<-Q$Pi[1]
rescalePi<-c(Q$Pi,ProbablityOfLastCluster)/(sum(Q$Pi)+ProbablityOfLastCluster)
##Getting Mu of final Cluster
finalCluster<-c()
for(i in c(1:K))
{
finalCluster<-c(finalCluster, -sum(rescaledNu[,i]*rescalePi[1:(L-1)]/rescalePi[L]))
}
rescaledNu<-rbind(rescaledNu,finalCluster)
#Making Covariance matrix equal to Identity
newS<-array(0, c(K, K, L))
for(i in c(1:L))
{
m<-matrix(0,ncol = 5,nrow = 5)
diag(m)<-rep(1,5)
newS[,,i]<-m
}
FactorClusters <- simdataset(n = N, Pi = rescalePi, Mu = rescaledNu, S = newS)
simulate_weights <- function(k,## factors
j, ## biomark
params=list(c(mean=1,sd=1)), ## parameter distribution
J, ## all biomarkers
labelName="gene"
){
#print(params)
m <- params["mean"]
sd <- params["sd"]
c= matrix(rnorm(k*j,mean=m,sd=sd),ncol=j,nrow=k)
c= cbind(c, matrix(0,nrow=k, ncol=J-j))
idx <- sample(1:ncol(c))
positive <- sapply(1:j, function (ll) which(ll== idx))
c<-c[,idx]
colnames(c) <- sprintf("gene%s", 1:ncol(c))
#print(head(positive))
positive <- paste0("gene",positive)
return(list(C=c,positive=positive))
}
GammaNoise<-function(noOfSamples,noOfFeatures,rate)
{
sdForFeatures<-rgamma(noOfFeatures,1)
lNoise<-lapply(1:noOfFeatures,function(x){
rnorm(noOfSamples, mean=0, sd=sdForFeatures[x])
})
return(t(do.call("rbind", lNoise)))
}
expWeights<-simulate_weights(K,200,(c(mean=2,sd=1)),1000,"gene")
probesWeights<-simulate_weights(K,250,(c(mean=2,sd=2)),1125,"probe")
#Add Noise for gene expression data
dataExpression<- FactorClusters$X %*% expWeights$C
dataExpression <- dataExpression + GammaNoise(nrow(dataExpression),ncol(dataExpression),1)
#dataExpression <- dataExpression + matrix(rnorm(N*ncol(dataExpression), mean=0, sd=0.1),nrow=N, ncol=ncol(dataExpression))
#Add Noise for DNAMethylation data
dataMethylation<- FactorClusters$X %*% probesWeights$C
dataMethylation <- dataMethylation + GammaNoise(nrow(dataMethylation),ncol(dataMethylation),1)
Hi,
I am receiving the following warning message when I am trying to run fad on a simulated from MixSim Package.
Warning message: In fad(scale(d), 4) : Algorithm may not have converged. Try another starting value.
Can you please comment on what steps should I take to rectify this?
Steps to simulate dataset I am first simulating the factor matrix using the MixGOM function. The details of the simulated data matrix are:
factors: 5
samples: 100
Clusters:4
GeneralizedOverlap: 0.01
After that, we simulated the weight matrix (loadings matrix) using Gaussian Distribution (N(mean=2,sd=2)) with a total number of features as 1125. We further multiplied the simulated factor matrix and loading matrix to generate our dataset (100 X 1125). We then added noise to that dataset using Gamma distribution (rate = 1).
Note: fad is not giving warning if I simulate weight matrix using Gaussian Distribution (N(mean=2,sd=1)) with a total number of features as 1000.
Thanks, Ashish Jain