Closed ashishjain1988 closed 4 years ago
Yes, according to BIC at least. But the factor model is wrong for the data - as observations are not iid (independently but not identically distributed) - BIC might not be consistent if the model is misspecified.
I am using simulation from Pierre-Jean et al. paper (https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbz138/5645549) to generate datasets (from Gaussian distribution) with pre-defined clusters and features linked to those clusters. After checking the BIC for K in 1 to 15, I am getting optimal K=0 which means there is no optimal number of factors? The details of the simulation are:
Gaussian Distribution parameters: N(Mean=2, S.D.=1)
Clusters: 4
Features: 1000
Features linked to clusters: 100
Sample: 100
Added Noise (from Gaussian distribution): N(Mean=0, S.D.=4)
You can find simulated data @ https://github.com/ashishjain1988/machine-learning-examples/blob/master/simulated_data1_1.txt