somakd / fad

Factor Analysis for Data in R
3 stars 1 forks source link

No optimal initial #factor for the simulated data #3

Closed ashishjain1988 closed 4 years ago

ashishjain1988 commented 4 years ago

I am using simulation from Pierre-Jean et al. paper (https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbz138/5645549) to generate datasets (from Gaussian distribution) with pre-defined clusters and features linked to those clusters. After checking the BIC for K in 1 to 15, I am getting optimal K=0 which means there is no optimal number of factors? The details of the simulation are:

Gaussian Distribution parameters: N(Mean=2, S.D.=1)

Clusters: 4

Features: 1000

Features linked to clusters: 100

Sample: 100

Added Noise (from Gaussian distribution): N(Mean=0, S.D.=4)

You can find simulated data @ https://github.com/ashishjain1988/machine-learning-examples/blob/master/simulated_data1_1.txt

somakd commented 4 years ago

Yes, according to BIC at least. But the factor model is wrong for the data - as observations are not iid (independently but not identically distributed) - BIC might not be consistent if the model is misspecified.