Open tuanle618 opened 6 years ago
I dont think the problem is which seed set but that a seed is set at all. This makes the everything deterministic.
I will check if adding set.seed(Sys.time())
before the random id generation will solve this.
Why is in makePCAReport a seed?
@MiGraber just in case if the user calls this function instead the S3-Method. If you dont set a seed it still happens. You can try it
Ok I think i got it. It seems to be ggscatter function. I removed all set seeds in our function. -> As you said the issue is still there Then I tracked the .Random.seed variable and looked when it changed. -> I dont know why but ggscatter seems to set a seed. Adding set.seed(Sys.time()) after ggscatter fixes the bug
And refering to the seed in makePCAReport. Why is there a seed in the S3-Method.
here an example of ggscatter:
library(ggpubr)
# Load data
data("mtcars")
df <- mtcars
df$cyl <- as.factor(df$cyl)
head(df[, c("wt", "mpg", "cyl")], 3)
for (i in 1:2) {
mds.plot = ggscatter(df, x = "wt", y = "mpg",
label = rownames(df),
size = 1,
repel = FALSE) + theme_classic(base_size = 10) + ggtitle("title")
print(runif(1))
}
returns
[1] 0.2875775
[1] 0.2875775
Thank you @MiGraber . I will push a small change in a test to pass travis, could you then please add the set.seed(Sys.time()) in the needed functions in order to fix the bug? Please remove the set.seed(89) in the 2 makePCAReport functions in tle_vignette branch.
I added in both functions a seed just in case the user either calls makeReport(pca.result)
or makePCAReport(pca.result)
There seem to be some more functions which do this. For example fviz_nbclust. I think will take another apraoch to fix this.
Alright, great. If you need support let me know. Besides that there are additionally plot function which might cause a setting seed:
1) getMDSAnalysis
calls ggscatter
as well:
https://github.com/ptl93/AEDA/blob/3f1685af61dcc620d98933672d2f4c91f9ef7bde/R/getMDSAnalysis.R#L55
2) getClusterAnalysis
with all its methods for plotting mostly fviz_nbclust
[this is for the analysis to get the optimal no. of cluster],
fviz_cluster
[for plotting the cluster result]
fviz_dist
[for plotting the distance in case of hierarchical clustering]
fviz_dend
[in case of plotting the result of hierarchical clustering]
fviz_silhouette
[plotting silhouette plot for analysis] ....
there might be more, just scam the code and everything with fviz_ might be problematic?
https://github.com/ptl93/AEDA/blob/tle_vignette/R/getClusterAnalysis.R
3) makePCA
calls fviz_eigen()
, autoplot()
and fviz_pca_ind()
https://github.com/ptl93/AEDA/blob/3f1685af61dcc620d98933672d2f4c91f9ef7bde/R/makePCA.R#L40-L49
You might just set a random seed depending of Sys.time right before the corresponding function 3 get-functions end.
After I found out that more functions set.seeds I concentrated on reportID(). But the solution is not very nice. If you want to test it... I dont get this bug anymore.
When trying to run fastReport I noticed, that the ID for PCA and MDS Reports are the same. I believe that those functions (for pca
prcomp()
and for mdscmdscale()
but alsoisoMDS
and maybe the other methods inmakeMDSTask()
might as well set the seed after execution to the same seed as the pca. Because when callingmakeReport(pca.result)
andmakeReport(mds.result)
both report.ids are the same. I investigated this further and found out that when applying another report between mds and pca, like numsum for example and then after pca another report like catsum, the id for numsum and catsum are the same. This I believe confirms my believe, that somehow after themakeMDS
andmakePCA
which are right before themakeReport
step set the seed to the same number.Reproducible error:
As of now I set the seed to 89 in makeReport.PCAObj and makePCAReport to manually set another seed and fix the issue.