mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
170 stars 51 forks source link

Execution problem #247

Closed chim3y closed 4 years ago

chim3y commented 4 years ago

I would be extremely grateful for some guidance in relation to the following benchmarking. Recently, I have been trying to benchmark using batch tools for 3 learners namely rpart, logreg and random forest. For the small dataset with ids(1:208) and (209:252), it can execute but as soon as it is with ids(253:319) with training time over 3000000, the program keeps running until the 45% but doesn't go beyond that. This has been happening for the past 4 days. I cannot understand what is the main problem? I am really new to this but it will certainly mean a lot if you can kindly suggest me something.

Benchmark.zip

mllg commented 4 years ago

Impossible to debug for me. I don't know what the ids are you are referring to. OML dataset ids or job ids on the local batch system? Why do you think the training time is over 3000000? Which program is running until 45% - submitJobs() or the download, or is this the progress of a single job?

Sorry, there is really nothing I can do here. You need to boil the problem down to something reproducible in a few lines of code, or need to find someone more familiar with your batch system to debug this with you.

chim3y commented 4 years ago

Hello sir, thank you so much for your response. I have run this program for the past 2 months and referred "https://github.com/DklRaf/Benchmark_RF-LR_OpenML". I have tried contacting the author however, I am receiving no response. I beg your pardon for being unclear before.

The file that I am currently mentioning in the zip file is in /benchmark folder named "benchmark-batchtools.R". Upon executing submitJobs(ids = 253:319, reg = regis) , as soon as it reaches job id=286 whose mlr task id=189778 and dataid=4135, it gets struck. It neither ends the execution and nor stop it.

48% means the progress of the above jobs (253:319) containing around 67 jobs with the attached tasks. clas_time_big.xlsx

This is how the progress bar appears image

The log in my folder says:

[bt]: This is batchtools v0.9.11

[bt]: Starting calculation of 1 jobs

[bt]: Setting working directory to 'C:/Users/Comm/Documents/BENCHMARKv18/Data/Results/Batchtools'

[bt]: Memory measurement disabled

[bt]: Starting job [batchtools job.id=285]

[bt]: Generating problem instance for problem '4135' ...

[bt]: Applying algorithm 'eval' on problem '4135' for job 285 (seed = 286) ...

Data '4135' file 'description.xml' found in cache. Data '4135' file 'dataset.arff' found in cache. Task: Amazon_employee_access, Learner: classif.logreg Resampling: repeated cross-validation Measures: acc brier auc timetrain

Any suggestions are really valuable to me.