ykhdrew / qMSM_tutorial

A Tutorial for quasi Markov State Model(qMSM) developed by Huang Group, Dept of Chemistry at UW-Madison
Creative Commons Attribution 4.0 International
14 stars 3 forks source link

Performing GMRQ #1

Open drdoppio opened 1 year ago

drdoppio commented 1 year ago

At the stage of Gmrq.ipynb of your tutorial I have gotten the following errors:

Input In [1], in <cell line: 87>() 84 no_clusters=[700,800,900] 85 tica_lagtime=[2,4,6] ---> 87 GMRQ(no_of_features,no_of_components,no_clusters,tica_lagtime)

Input In [1], in GMRQ(no_of_features, no_of_components, no_clusters, tica_lagtime, parameter, gmrq_dir, clustering_dir) 56 else: 57 print('Running the {}th cycle'.format(i+1)) ---> 58 cv = KFold(len(trajectories), n_folds=6, shuffle=True) #split the dataset into training set and test set 59 results = [] 60 print_results = [] #train score, test score

TypeError: init() got an unexpected keyword argument 'n_folds'

Any suggestions? thanks Sandor

drdoppio commented 1 year ago

sklearn.cross_validation is not working I have changed it to sklearn.model_selection

ykhdrew commented 1 year ago

Hi Sandor

Were you able to solve the issue? I tried with a newly-created conda environment with my script i.e. conda env create -n msmbuilder -f environment.yml and GMRQ.ipynb works fine on my end.

This tutorial was using a very old scikit learn and things might be quite different in newer version.

drdoppio commented 1 year ago

I was not able to resolve the problem with changing the n_folders to n_splits I still have the error:

Input In [1], in <cell line: 87>() 84 no_clusters=[700,800,900] 85 tica_lagtime=[2,4,6] ---> 87 GMRQ(no_of_features,no_of_components,no_clusters,tica_lagtime)

Input In [1], in GMRQ(no_of_features, no_of_components, no_clusters, tica_lagtime, parameter, gmrq_dir, clustering_dir) 56 else: 57 print('Running the {}th cycle'.format(i+1)) ---> 58 cv = KFold(len(trajectories), n_splits=6, shuffle=True) #split the dataset into training set and test set 59 results = [] 60 print_results = [] #train score, test score

TypeError: init() got multiple values for argument 'n_splits'

--


Dr. Sandor Lovas, Professor Associate Editor, Frontiers in Chemical Biology

Department of Biomedical Sciences Creighton University, School of Medicine 2500 California Plaza Omaha, NE 68178, USA Phone: 402-280-5753 Fax: 402-280-2690

On Mon, 2023-06-05 at 17:13 -0700, Andrew Yik wrote:

Were you able to solve the issue? I tried with a newly-created conda environment with my script i.e. conda env create -n msmbuilder -f environment.yml and GMRQ.ipynb works fine on my end.

This tutorial was using a very old scikit learn and things might be quite different in newer version.

— Reply to this email directly, view it on GitHubhttps://github.com/ykhdrew/qMSM_tutorial/issues/1#issuecomment-1577716733, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A4CNJD5Z2C4YUN56722Q6QTXJZY4DANCNFSM6AAAAAAY3PR3GY. You are receiving this because you authored the thread.Message ID: @.***>

msinclair-py commented 1 year ago

KFold no longer takes data at the first position as in this tutorial. The KFold line should read as: kf = KFold(n_splits=6, shuffle=True)

Then in the for loop: for fold, (train_index, test_index) in enumerate( kf.split( list( range( len( trajectories ) ) ) ) ):