Open dosshra opened 2 years ago
Hi there,
Can you provide a reproducible example? I suspect it is to do with the Y variables here. The warnings you get are common during the tuning process - it means for some tuning parameters in the models there wasn't enough signal to calculate r2 properly. Usually these warnings aren't a huge issue.
Cheers,
Nick
Thank you for the reply. Please see a RData file with X&Y. Y is a matrix of minor allele proportions in 110 populations, and X is the environmental variables of the populations. mrIML.zip
I couldn't recreate your error - mrIMLperformance worked fine for me with your data. When did you install the package?
I found a problem with my Y matrix. It is running OK now. Thank you
Hi
Using the data that I uploaded, and the code in the original message, It takes 4 hours with 20 CPU to complete. Y is only 110x618 matrix, X includes 12 variables, and I run only 10 trees.
The mrIML installation is from last week on R version 3.6.3. Trying with R 4.1.2 with 7 CPU also took a very long time. What could be the problem?
Thank you
You could try to reduce the tuning grid size (i.e. not test as many hyperparameter combinations). If you are comparing to gradient forests this is why it takes longer - GF doesn't do any model tuning. That said, it took overnight on my machine using 6 cores which is longer than I thought it would given the data dimensions.You could try other algorithms too - SVM could be faster if you are on a time crunch. https://parsnip.tidymodels.org/articles/articles/Models.html. [https://parsnip.tidymodels.org/logo.png]https://parsnip.tidymodels.org/articles/articles/Models.html List of Models • parsniphttps://parsnip.tidymodels.org/articles/articles/Models.html parsnip is a part of the tidymodels ecosystem, a collection of modeling packages designed with common APIs and a shared philosophy. parsnip.tidymodels.org
From: dosshra @.> Sent: Wednesday, March 23, 2022 2:34 PM To: nfj1380/mrIML @.> Cc: Nicholas Fountain-Jones @.>; Comment @.> Subject: Re: [nfj1380/mrIML] Error in data.frame(sp, mod_name, rmse, rsq): arguments imply differing number of rows: 0, 1 (Issue #4)
Hi Using the data that I uploaded, and the code in the original message, It takes 4 hours with 20 CPU to complete. Y is only 110x618 matrix, X includes 12 variables, and I run only 10 trees. The mrIML installation is from last week on R version 3.6.3. Trying with R 4.1.2 with 7 CPU also took a very long time. What could be the problem? Thank you
— Reply to this email directly, view it on GitHubhttps://github.com/nfj1380/mrIML/issues/4#issuecomment-1075879428, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIBFOL2TBY2BRUD6AV53WHLVBKGMHANCNFSM5Q2ZYJHA. You are receiving this because you commented.Message ID: @.***>
This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.
I just tried with xgb and it worked much faster than the rf model for some reason. Might be also be worth a try.
try this:
model_xgb<- boost_tree( trees = 100, tree_depth = tune(), min_n = tune(), loss_reduction = tune(), ## first three: model complexity sample_size = tune(), mtry = tune(), ## randomness learn_rate = tune(), ## step size ) %>% set_engine("xgboost") %>% set_mode("regression")
From: dosshra @.> Sent: Thursday, March 17, 2022 3:22 PM To: nfj1380/mrIML @.> Cc: Nicholas Fountain-Jones @.>; Comment @.> Subject: Re: [nfj1380/mrIML] Error in data.frame(sp, mod_name, rmse, rsq): arguments imply differing number of rows: 0, 1 (Issue #4)
Thank you for the reply. Please see a RData file with X&Y. Y is a matrix of minor allele proportions in 110 populations, and X is the environmental variables of the populations. mrIML.ziphttps://github.com/nfj1380/mrIML/files/8281359/mrIML.zip
— Reply to this email directly, view it on GitHubhttps://github.com/nfj1380/mrIML/issues/4#issuecomment-1070311396, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIBFOLYZJXNH2VEELO4GN43VAKXRXANCNFSM5Q2ZYJHA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.***>
This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.
Thank you for the suggestion, xgb do run faster. However: trying to continue exploring the data using this code:
yhats_rf <- mrIMLpredicts(X=X,Y=Y, Model=model_rf, balance_data='no', mode='regression', tune_grid_size=5, seed = sample.int(1e8, 1) ) ModelPerf <- mrIMLperformance(yhats_rf, Model=model_rf, Y, mode='regression') VI <- mrVip(yhats=yhats_rf, X=X)
resulted in error.
HI
I was able to run the RF regression for the whole dateset of >60k SNP using mtry=4, trees = 10 .
It took 6 hrs with 20 CPU and more than 130GB of RAM.
However, the flashlight analysis:
flashlightObj <- mrFlashlight(yhats_rf, X, Y, response = "multi", mode='regression')
profileData_pd <- light_profile(flashlightObj, v = "november") #partial dependencies
mrProfileplot(profileData_pd , sdthresh =0.1)
Did not finish after 24 hours.
Thank you
I think that would be close to a record for number of SNPs in a mrIML analysis -I'm glad it worked even if it was intensive.
With the VI problem can you provide more information? Also, I have updated that code recently - so try reinstalling from github.
Cheers,
Nick
From: dosshra @.> Sent: Thursday, March 31, 2022 4:54 PM To: nfj1380/mrIML @.> Cc: Nicholas Fountain-Jones @.>; Comment @.> Subject: Re: [nfj1380/mrIML] Error in data.frame(sp, mod_name, rmse, rsq): arguments imply differing number of rows: 0, 1 (Issue #4)
HI I was able to run the RF regression for the whole dateset of >60k SNP using mtry=4, trees = 10 . It took 6 hrs with 20 CPU and more than 130GB of RAM. However, the flashlight analysis: flashlightObj <- mrFlashlight(yhats_rf, X, Y, response = "multi", mode='regression') profileData_pd <- light_profile(flashlightObj, v = "november") #partial dependencies mrProfileplot(profileData_pd , sdthresh =0.1) Did not finish after 24 hours. Thank you
— Reply to this email directly, view it on GitHubhttps://github.com/nfj1380/mrIML/issues/4#issuecomment-1084104590, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIBFOL54IVDGVHMYCT6OVL3VCU4Y7ANCNFSM5Q2ZYJHA. You are receiving this because you commented.Message ID: @.***>
This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.
Hello I tried to run mrIML using this script:
cl <- parallel::makeCluster(20)
future::plan(cluster, workers=cl)
X<-as.data.frame(gfdf[,2:13])
Y<-gfdf[,seq(14,61822,100)]
model_rf <- rand_forest(trees = 10, mode = "regression", mtry = tune(), min_n = tune()) %>% set_engine("randomForest")
yhats_rf <- mrIMLpredicts(X=X,Y=Y, Model=model_rf, balance_data='no', mode='regression', tune_grid_size=5, seed = sample.int(1e8, 1) )
I get a lot of warning during the run:When I run the following code:
ModelPerf <- mrIMLperformance(yhats_rf, Model=model_rf, Y, mode='regression')
I get this error:Thank you