gregy4 commented 5 years ago

For given proto file I use 8 search workers. During execution I found that cpu usage for long periods doesn't match with number of search workers. Cpu utilization is with other proto files around 25-30% (comply with number of threads / available cores), with this proto file spectrum is much wider between 4 and 30%. It seems that lower cpu utilization has something to do with situation when a solution is found and after call of solution callback cpu utilization is improved. Given example consume during search around 40G of memory in ortools 7.3

ghost commented 5 years ago

Your question sounds similar to https://github.com/google/or-tools/issues/1420, but I'm not 100% sure if it still works like that

lperron commented 5 years ago

Most of the time is spent presolving the model, which is sequential.

Solving is fast and uses all threads.

I0919 12:00:42.190474 185435 sat_runner.cc:168] Reading a CpModelProto. I0919 12:00:43.612012 185435 cp_model_solver.cc:2251] Parameters: num_search_workers: 8 I0919 12:00:43.753345 185435 cp_model_solver.cc:2265] Satisfaction model '':

Variables: 119200

119200 in [0,1]
kLinear2: 1631700

kLinearN: 1192

I0919 12:00:43.877019 185435 cp_model_solver.cc:2308] *** starting model presolve at 0.27s I0919 12:01:02.079699 185435 cp_model_presolve.cc:3873] - 0 affine relations were detected. I0919 12:01:02.079766 185435 cp_model_presolve.cc:3875] - 0 variable equivalence relations were detected. I0919 12:01:02.079798 185435 cp_model_presolve.cc:3883] - rule 'bool_or: implications' was applied 1631700 times. I0919 12:01:02.079804 185435 cp_model_presolve.cc:3883] - rule 'linear: empty' was applied 37 times. I0919 12:01:02.079809 185435 cp_model_presolve.cc:3883] - rule 'linear: negative clause' was applied 1631700 times. I0919 12:01:02.079813 185435 cp_model_presolve.cc:3883] - rule 'linear: positive equal one' was applied 999 times. I0919 12:01:02.079818 185435 cp_model_presolve.cc:3883] - rule 'linear: singleton column' was applied 37 times. I0919 12:01:02.079825 185435 cp_model_presolve.cc:3883] - rule 'presolve: iteration' was applied 2 times. I0919 12:01:02.137112 185435 cp_model_solver.cc:2333] Satisfaction model '':

Variables: 115500
115500 in [0,1]
kAtMostOne: 41699 (#literals: 622800)

kBoolAnd: 3100 (#enforced: 3100) (#literals: 3100)

kBoolOr: 999 (#literals: 99900)

kLinearN: 156

I0919 12:01:02.223026 185435 cp_model_solver.cc:2196] *** starting Search at 18.61s with 8 workers and strategies: [ auto, less encoding, no_lp, max_lp, random, random_1, random_2, random_3 ] I0919 12:01:03.448972 185547 synchronization.cc:110] #1 19.84s no_lp num_bool:115500 I0919 12:01:03.451136 185552 synchronization.cc:110] #2 19.84s less encoding num_bool:115500 I0919 12:01:05.104082 185435 cp_model_solver.cc:2454] CpSolverResponse: status: FEASIBLE objective: NA best_bound: NA booleans: 115500 conflicts: 0 branches: 250032 propagations: 8889668 integer_propagations: 458400 walltime: 21.4315 usertime: 21.4315 deterministic_time: 2.02935 primal_integral: 0

gregy4 commented 5 years ago

Sorry I have to disagree. Of course I know that presolve use only single thread, I talk about next phase - searching. I think the issue is not related to #1420, since additional search workers are used for lns and when no solution exists yet it doesn't make sense to execute them.

Attached proto file solves optimization and not feasibility problem. There's my code : File initialFile = new File("c:/temp/test2.proto"); InputStream targetStream = new FileInputStream(initialFile);

    CpModel model = new CpModel();
    model.getBuilder().mergeFrom(targetStream);

    CpSolver cpSolver = new CpSolver();

    cpSolver.getParameters().setLogSearchProgress(true);
    cpSolver.getParameters().setNumSearchWorkers(8);

    CpSolverSolutionCallback solutionCallBack =  new CpSolverSolutionCallback()
    {
        @Override
        public void onSolutionCallback()
        {
            System.out.println("solution found");
        }
    };

    cpSolver.solveWithSolutionCallback(model, solutionCallBack);

lperron commented 5 years ago

Sorry, too many files from support with the same names

Here is what I get on master:

216428 lperron 20 0 5177696 4.2g 19720 S 792.1 2.2 11:57.87 sat_runner

It uses 8 CPUs after 1 min and 30s, consistently.

At worst, it will use 5 threads when searching for a solution, and launch all 8 when/if it finds them.

gregy4 commented 5 years ago

Understand :-) After 3hours of execution I still see periods when cpu usage is around 11% that corresponds to 4 threads at max for my cpu. I'll try to test it on master, run it on another OS and get some statistics about cpu usage. Maybe some synchronization between search workers is causing this behaviour.

lperron commented 5 years ago

how many cores do you have ?

Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00

Le jeu. 19 sept. 2019 à 13:28, gregy4 notifications@github.com a écrit :

Understand :-) After 3hours of execution I still see periods when cpu usage is around 11% that corresponds to 4 threads at max for my cpu. I'll try to test it on master, run it on another OS and get some statistics about cpu usage. Maybe some synchronization between search workers is causing this behaviour.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3NQUIHQ7G6CD4MWYN3QKNO4FA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DEQHQ#issuecomment-533088286, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3POFZCALCN5XJIOQLLQKNO4FANCNFSM4IYINESA .

gregy4 commented 5 years ago

16 real cores, with hyperthreading shown as 32 cores. threadripper 1950x

lperron commented 5 years ago

So usage is 5 / 32 before the first solution and 8/32 when one solution is found. It is consistent.

PS: You should use 16 workers or more. Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00

Le jeu. 19 sept. 2019 à 13:34, gregy4 notifications@github.com a écrit :

16 real cores, with hyperthreading shown as 32 cores. threadripper 1950x

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3MX5CQGOGWGTLSDP33QKNPV3A5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DE7YY#issuecomment-533090275, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3MMKYI4FY5HEQQ5YFLQKNPV3ANCNFSM4IYINESA .

gregy4 commented 5 years ago

I agree, this is how it should look like. Meanwhile I used performance profiler from visual studio and have more correct data. Phase one - presolve, one thread is used - in my case cpu usage is around 3-4% Phase two - solve started, no solution is found yet - cpu utilization varies between 9-30% Phase three - solution is found, displayed message

1 202.16s best:5460 next:[5018,5459] no_lp num_bool:6511250

Phase four - Called solution callback

Between information that solution was found and execution of callback is minimally 40 seconds interval when cpu utilization is between 3-4%.

I can send you report file from VC, but it is little bit big - 30mb per 3 minutes.

lperron commented 5 years ago

most likely a locking mechanism in the callback part. Can you remove the callback and try ? Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00

Le jeu. 19 sept. 2019 à 14:39, gregy4 notifications@github.com a écrit :

I agree, this is how it should look like. Meanwhile I used performance profiler from visual studio and have more correct data. Phase one - presolve, one thread is used - in my case cpu usage is around 3-4% Phase two - solve started, no solution is found yet - cpu utilization varies between 9-30% Phase three - solution is found, displayed message

1 https://github.com/google/or-tools/pull/1 202.16s best:5460

next:[5018,5459] no_lp num_bool:6511250 Phase four - Called solution callback

Between information that solution was found and execution of callback is minimally 40 seconds interval when cpu utilization is between 3-4%.

I can send you report file from VC, but it is little bit big 30mb per 3 minutes.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3ND3254BXXJYAJWI4DQKNXHPA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DKGKI#issuecomment-533111593, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3NAEPCW6WHRJOSO7C3QKNXHPANCNFSM4IYINESA .

gregy4 commented 5 years ago

Definitely much better ... no interval where only one core is used. This time I choose to use 16 cores but only for short time 16cores (about 50%) were used - on graph between 2:45 and 3:45. After 3:45 cpu usage is between 10 - 16% so callback is probably not the only one reason. cpu_profile

lperron commented 5 years ago

If you are not in C++, there may be a global lock going back to C++, forcing all threads with solutions to wait. Inside, there is also a mutex, but normally the locking part is very minimal.

In your graph, does the 3.45 minutes corresponds to finding the optimal solution? Sometimes, some threads are slow to finish. Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00

Le jeu. 19 sept. 2019 à 15:39, gregy4 notifications@github.com a écrit :

Definitely much better ... no interval where only one core is used. This time I choose to use 16 cores but only for short time 16cores (about 50%) were used - on graph between 2:45 and 3:45. After 3:45 cpu usage is between 10 - 16% so callback is probably not only the one reason. [image: cpu_profile] https://user-images.githubusercontent.com/5640112/65249124-b16b0600-daf3-11e9-890f-5004ab44aaed.png

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3LRLX32T5YTGBEENXTQKN6KTA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DP7QY#issuecomment-533135299, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3PPKQMRPAUXF3FNSHTQKN6KTANCNFSM4IYINESA .

gregy4 commented 5 years ago

No, optimal solution wasn't found and probably is not reachable in reasonable time.

lperron commented 5 years ago

It could be thermal throttling of the CPU.

Le jeu. 19 sept. 2019 à 16:23, gregy4 notifications@github.com a écrit :

No, optimal solution wasn't found and probably is not reachable in reasonable time.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3JHN3JAH7JK2H2MCPTQKODPXA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DUU6Y#issuecomment-533154427, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3IMB2HA3SZJRNLSTFDQKODPXANCNFSM4IYINESA .

gregy4 commented 5 years ago

I checked it on another problem, and cpu temperature was 5 degrees (67) above with constant 50% cpu utilization. For this problem temperature is at max around 63 degrees. I see memory bandwith as more probable reason, 40gb can be quite a lot for memory controller. Unfortunatelly for amd I'm not aware about software that can measure it.

timothylowjc commented 3 years ago

Hello!

I have roughly the same issue here, currently have my solvers running on AWS Batch using a c5.metal EC2 instance, which is 96 vCPUs and 192GB RAM.

Proto is a too large for github, but this should work

Anyway,

I've currently set num_search_workers to 192, on the off chance I've misunderstood something about logical threads, but it appears that CPU utilization never breaks 30%.

Doubt its a temperature problem as mentioned, given vCPUs. CPU Usage ends in the graph after a certain amount of time has passed (400 seconds).

Any help is appreciated!

lperron commented 3 years ago

You are right. I could reproduce on my machine.

timothylowjc commented 3 years ago

Good to know its reproducible, thanks for the hard work! Looking forward to seeing the fix on master

lperron commented 3 years ago

I have done a first pass and committed to master. If possible, can you try ?

timothylowjc commented 3 years ago

Hi @lperron ,

Sorry about the late reply. I'll let you know sometime this week, we gotta figure out how to deploy master branches using AWS Batch first. Thanks for the fix!

timothylowjc commented 3 years ago

Alright, looks like the results speak for themselves.

Thanks for the fix @lperron !

google / or-tools

low cpu usage for given proto file #1588

Variables: 119200

kLinear2: 1631700

kLinearN: 1192

Variables: 115500

kAtMostOne: 41699 (#literals: 622800)

kBoolAnd: 3100 (#enforced: 3100) (#literals: 3100)

kBoolOr: 999 (#literals: 99900)

kLinearN: 156

1 202.16s best:5460 next:[5018,5459] no_lp num_bool:6511250

1 https://github.com/google/or-tools/pull/1 202.16s best:5460