Closed gregy4 closed 3 years ago
Your question sounds similar to https://github.com/google/or-tools/issues/1420, but I'm not 100% sure if it still works like that
Most of the time is spent presolving the model, which is sequential.
Solving is fast and uses all threads.
I0919 12:00:42.190474 185435 sat_runner.cc:168] Reading a CpModelProto. I0919 12:00:43.612012 185435 cp_model_solver.cc:2251] Parameters: num_search_workers: 8 I0919 12:00:43.753345 185435 cp_model_solver.cc:2265] Satisfaction model '':
I0919 12:00:43.877019 185435 cp_model_solver.cc:2308] *** starting model presolve at 0.27s I0919 12:01:02.079699 185435 cp_model_presolve.cc:3873] - 0 affine relations were detected. I0919 12:01:02.079766 185435 cp_model_presolve.cc:3875] - 0 variable equivalence relations were detected. I0919 12:01:02.079798 185435 cp_model_presolve.cc:3883] - rule 'bool_or: implications' was applied 1631700 times. I0919 12:01:02.079804 185435 cp_model_presolve.cc:3883] - rule 'linear: empty' was applied 37 times. I0919 12:01:02.079809 185435 cp_model_presolve.cc:3883] - rule 'linear: negative clause' was applied 1631700 times. I0919 12:01:02.079813 185435 cp_model_presolve.cc:3883] - rule 'linear: positive equal one' was applied 999 times. I0919 12:01:02.079818 185435 cp_model_presolve.cc:3883] - rule 'linear: singleton column' was applied 37 times. I0919 12:01:02.079825 185435 cp_model_presolve.cc:3883] - rule 'presolve: iteration' was applied 2 times. I0919 12:01:02.137112 185435 cp_model_solver.cc:2333] Satisfaction model '':
I0919 12:01:02.223026 185435 cp_model_solver.cc:2196] *** starting Search at 18.61s with 8 workers and strategies: [ auto, less encoding, no_lp, max_lp, random, random_1, random_2, random_3 ] I0919 12:01:03.448972 185547 synchronization.cc:110] #1 19.84s no_lp num_bool:115500 I0919 12:01:03.451136 185552 synchronization.cc:110] #2 19.84s less encoding num_bool:115500 I0919 12:01:05.104082 185435 cp_model_solver.cc:2454] CpSolverResponse: status: FEASIBLE objective: NA best_bound: NA booleans: 115500 conflicts: 0 branches: 250032 propagations: 8889668 integer_propagations: 458400 walltime: 21.4315 usertime: 21.4315 deterministic_time: 2.02935 primal_integral: 0
Sorry I have to disagree. Of course I know that presolve use only single thread, I talk about next phase - searching. I think the issue is not related to #1420, since additional search workers are used for lns and when no solution exists yet it doesn't make sense to execute them.
Attached proto file solves optimization and not feasibility problem. There's my code : File initialFile = new File("c:/temp/test2.proto"); InputStream targetStream = new FileInputStream(initialFile);
CpModel model = new CpModel();
model.getBuilder().mergeFrom(targetStream);
CpSolver cpSolver = new CpSolver();
cpSolver.getParameters().setLogSearchProgress(true);
cpSolver.getParameters().setNumSearchWorkers(8);
CpSolverSolutionCallback solutionCallBack = new CpSolverSolutionCallback()
{
@Override
public void onSolutionCallback()
{
System.out.println("solution found");
}
};
cpSolver.solveWithSolutionCallback(model, solutionCallBack);
Sorry, too many files from support with the same names
Here is what I get on master:
216428 lperron 20 0 5177696 4.2g 19720 S 792.1 2.2 11:57.87 sat_runner
It uses 8 CPUs after 1 min and 30s, consistently.
At worst, it will use 5 threads when searching for a solution, and launch all 8 when/if it finds them.
Understand :-) After 3hours of execution I still see periods when cpu usage is around 11% that corresponds to 4 threads at max for my cpu. I'll try to test it on master, run it on another OS and get some statistics about cpu usage. Maybe some synchronization between search workers is causing this behaviour.
how many cores do you have ?
Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00
Le jeu. 19 sept. 2019 à 13:28, gregy4 notifications@github.com a écrit :
Understand :-) After 3hours of execution I still see periods when cpu usage is around 11% that corresponds to 4 threads at max for my cpu. I'll try to test it on master, run it on another OS and get some statistics about cpu usage. Maybe some synchronization between search workers is causing this behaviour.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3NQUIHQ7G6CD4MWYN3QKNO4FA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DEQHQ#issuecomment-533088286, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3POFZCALCN5XJIOQLLQKNO4FANCNFSM4IYINESA .
16 real cores, with hyperthreading shown as 32 cores. threadripper 1950x
So usage is 5 / 32 before the first solution and 8/32 when one solution is found. It is consistent.
PS: You should use 16 workers or more. Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00
Le jeu. 19 sept. 2019 à 13:34, gregy4 notifications@github.com a écrit :
16 real cores, with hyperthreading shown as 32 cores. threadripper 1950x
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3MX5CQGOGWGTLSDP33QKNPV3A5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DE7YY#issuecomment-533090275, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3MMKYI4FY5HEQQ5YFLQKNPV3ANCNFSM4IYINESA .
I agree, this is how it should look like. Meanwhile I used performance profiler from visual studio and have more correct data. Phase one - presolve, one thread is used - in my case cpu usage is around 3-4% Phase two - solve started, no solution is found yet - cpu utilization varies between 9-30% Phase three - solution is found, displayed message
Phase four - Called solution callback
Between information that solution was found and execution of callback is minimally 40 seconds interval when cpu utilization is between 3-4%.
I can send you report file from VC, but it is little bit big - 30mb per 3 minutes.
most likely a locking mechanism in the callback part. Can you remove the callback and try ? Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00
Le jeu. 19 sept. 2019 à 14:39, gregy4 notifications@github.com a écrit :
I agree, this is how it should look like. Meanwhile I used performance profiler from visual studio and have more correct data. Phase one - presolve, one thread is used - in my case cpu usage is around 3-4% Phase two - solve started, no solution is found yet - cpu utilization varies between 9-30% Phase three - solution is found, displayed message
1 https://github.com/google/or-tools/pull/1 202.16s best:5460
next:[5018,5459] no_lp num_bool:6511250 Phase four - Called solution callback
Between information that solution was found and execution of callback is minimally 40 seconds interval when cpu utilization is between 3-4%.
I can send you report file from VC, but it is little bit big 30mb per 3 minutes.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3ND3254BXXJYAJWI4DQKNXHPA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DKGKI#issuecomment-533111593, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3NAEPCW6WHRJOSO7C3QKNXHPANCNFSM4IYINESA .
Definitely much better ... no interval where only one core is used. This time I choose to use 16 cores but only for short time 16cores (about 50%) were used - on graph between 2:45 and 3:45. After 3:45 cpu usage is between 10 - 16% so callback is probably not the only one reason.
If you are not in C++, there may be a global lock going back to C++, forcing all threads with solutions to wait. Inside, there is also a mutex, but normally the locking part is very minimal.
In your graph, does the 3.45 minutes corresponds to finding the optimal solution? Sometimes, some threads are slow to finish. Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00
Le jeu. 19 sept. 2019 à 15:39, gregy4 notifications@github.com a écrit :
Definitely much better ... no interval where only one core is used. This time I choose to use 16 cores but only for short time 16cores (about 50%) were used - on graph between 2:45 and 3:45. After 3:45 cpu usage is between 10 - 16% so callback is probably not only the one reason. [image: cpu_profile] https://user-images.githubusercontent.com/5640112/65249124-b16b0600-daf3-11e9-890f-5004ab44aaed.png
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3LRLX32T5YTGBEENXTQKN6KTA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DP7QY#issuecomment-533135299, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3PPKQMRPAUXF3FNSHTQKN6KTANCNFSM4IYINESA .
No, optimal solution wasn't found and probably is not reachable in reasonable time.
It could be thermal throttling of the CPU.
Le jeu. 19 sept. 2019 à 16:23, gregy4 notifications@github.com a écrit :
No, optimal solution wasn't found and probably is not reachable in reasonable time.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3JHN3JAH7JK2H2MCPTQKODPXA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DUU6Y#issuecomment-533154427, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3IMB2HA3SZJRNLSTFDQKODPXANCNFSM4IYINESA .
I checked it on another problem, and cpu temperature was 5 degrees (67) above with constant 50% cpu utilization. For this problem temperature is at max around 63 degrees. I see memory bandwith as more probable reason, 40gb can be quite a lot for memory controller. Unfortunatelly for amd I'm not aware about software that can measure it.
Hello!
I have roughly the same issue here, currently have my solvers running on AWS Batch using a c5.metal EC2 instance, which is 96 vCPUs and 192GB RAM.
Proto is a too large for github, but this should work
Anyway,
I've currently set num_search_workers to 192, on the off chance I've misunderstood something about logical threads, but it appears that CPU utilization never breaks 30%.
Doubt its a temperature problem as mentioned, given vCPUs. CPU Usage ends in the graph after a certain amount of time has passed (400 seconds).
Any help is appreciated!
You are right. I could reproduce on my machine.
Good to know its reproducible, thanks for the hard work! Looking forward to seeing the fix on master
I have done a first pass and committed to master. If possible, can you try ?
Hi @lperron ,
Sorry about the late reply. I'll let you know sometime this week, we gotta figure out how to deploy master branches using AWS Batch first. Thanks for the fix!
Alright, looks like the results speak for themselves.
Thanks for the fix @lperron !
For given proto file I use 8 search workers. During execution I found that cpu usage for long periods doesn't match with number of search workers. Cpu utilization is with other proto files around 25-30% (comply with number of threads / available cores), with this proto file spectrum is much wider between 4 and 30%. It seems that lower cpu utilization has something to do with situation when a solution is found and after call of solution callback cpu utilization is improved. Given example consume during search around 40G of memory in ortools 7.3
proto.zip