google / or-tools

Google's Operations Research tools:
https://developers.google.com/optimization/
Apache License 2.0
11k stars 2.11k forks source link

low cpu usage for given proto file #1588

Closed gregy4 closed 3 years ago

gregy4 commented 5 years ago

For given proto file I use 8 search workers. During execution I found that cpu usage for long periods doesn't match with number of search workers. Cpu utilization is with other proto files around 25-30% (comply with number of threads / available cores), with this proto file spectrum is much wider between 4 and 30%. It seems that lower cpu utilization has something to do with situation when a solution is found and after call of solution callback cpu utilization is improved. Given example consume during search around 40G of memory in ortools 7.3

proto.zip

ghost commented 5 years ago

Your question sounds similar to https://github.com/google/or-tools/issues/1420, but I'm not 100% sure if it still works like that

lperron commented 5 years ago

Most of the time is spent presolving the model, which is sequential.

Solving is fast and uses all threads.

I0919 12:00:42.190474 185435 sat_runner.cc:168] Reading a CpModelProto. I0919 12:00:43.612012 185435 cp_model_solver.cc:2251] Parameters: num_search_workers: 8 I0919 12:00:43.753345 185435 cp_model_solver.cc:2265] Satisfaction model '':

Variables: 119200

gregy4 commented 5 years ago

Sorry I have to disagree. Of course I know that presolve use only single thread, I talk about next phase - searching. I think the issue is not related to #1420, since additional search workers are used for lns and when no solution exists yet it doesn't make sense to execute them.

Attached proto file solves optimization and not feasibility problem. There's my code : File initialFile = new File("c:/temp/test2.proto"); InputStream targetStream = new FileInputStream(initialFile);

    CpModel model = new CpModel();
    model.getBuilder().mergeFrom(targetStream);

    CpSolver cpSolver = new CpSolver();

    cpSolver.getParameters().setLogSearchProgress(true);
    cpSolver.getParameters().setNumSearchWorkers(8);

    CpSolverSolutionCallback solutionCallBack =  new CpSolverSolutionCallback()
    {
        @Override
        public void onSolutionCallback()
        {
            System.out.println("solution found");
        }
    };

    cpSolver.solveWithSolutionCallback(model, solutionCallBack);
lperron commented 5 years ago

Sorry, too many files from support with the same names

Here is what I get on master:

216428 lperron 20 0 5177696 4.2g 19720 S 792.1 2.2 11:57.87 sat_runner

It uses 8 CPUs after 1 min and 30s, consistently.

At worst, it will use 5 threads when searching for a solution, and launch all 8 when/if it finds them.

gregy4 commented 5 years ago

Understand :-) After 3hours of execution I still see periods when cpu usage is around 11% that corresponds to 4 threads at max for my cpu. I'll try to test it on master, run it on another OS and get some statistics about cpu usage. Maybe some synchronization between search workers is causing this behaviour.

lperron commented 5 years ago

how many cores do you have ?

Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00

Le jeu. 19 sept. 2019 à 13:28, gregy4 notifications@github.com a écrit :

Understand :-) After 3hours of execution I still see periods when cpu usage is around 11% that corresponds to 4 threads at max for my cpu. I'll try to test it on master, run it on another OS and get some statistics about cpu usage. Maybe some synchronization between search workers is causing this behaviour.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3NQUIHQ7G6CD4MWYN3QKNO4FA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DEQHQ#issuecomment-533088286, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3POFZCALCN5XJIOQLLQKNO4FANCNFSM4IYINESA .

gregy4 commented 5 years ago

16 real cores, with hyperthreading shown as 32 cores. threadripper 1950x

lperron commented 5 years ago

So usage is 5 / 32 before the first solution and 8/32 when one solution is found. It is consistent.

PS: You should use 16 workers or more. Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00

Le jeu. 19 sept. 2019 à 13:34, gregy4 notifications@github.com a écrit :

16 real cores, with hyperthreading shown as 32 cores. threadripper 1950x

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3MX5CQGOGWGTLSDP33QKNPV3A5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DE7YY#issuecomment-533090275, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3MMKYI4FY5HEQQ5YFLQKNPV3ANCNFSM4IYINESA .

gregy4 commented 5 years ago

I agree, this is how it should look like. Meanwhile I used performance profiler from visual studio and have more correct data. Phase one - presolve, one thread is used - in my case cpu usage is around 3-4% Phase two - solve started, no solution is found yet - cpu utilization varies between 9-30% Phase three - solution is found, displayed message

1 202.16s best:5460 next:[5018,5459] no_lp num_bool:6511250

Phase four - Called solution callback

Between information that solution was found and execution of callback is minimally 40 seconds interval when cpu utilization is between 3-4%.

I can send you report file from VC, but it is little bit big - 30mb per 3 minutes.

lperron commented 5 years ago

most likely a locking mechanism in the callback part. Can you remove the callback and try ? Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00

Le jeu. 19 sept. 2019 à 14:39, gregy4 notifications@github.com a écrit :

I agree, this is how it should look like. Meanwhile I used performance profiler from visual studio and have more correct data. Phase one - presolve, one thread is used - in my case cpu usage is around 3-4% Phase two - solve started, no solution is found yet - cpu utilization varies between 9-30% Phase three - solution is found, displayed message

1 https://github.com/google/or-tools/pull/1 202.16s best:5460

next:[5018,5459] no_lp num_bool:6511250 Phase four - Called solution callback

Between information that solution was found and execution of callback is minimally 40 seconds interval when cpu utilization is between 3-4%.

I can send you report file from VC, but it is little bit big 30mb per 3 minutes.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3ND3254BXXJYAJWI4DQKNXHPA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DKGKI#issuecomment-533111593, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3NAEPCW6WHRJOSO7C3QKNXHPANCNFSM4IYINESA .

gregy4 commented 5 years ago

Definitely much better ... no interval where only one core is used. This time I choose to use 16 cores but only for short time 16cores (about 50%) were used - on graph between 2:45 and 3:45. After 3:45 cpu usage is between 10 - 16% so callback is probably not the only one reason. cpu_profile

lperron commented 5 years ago

If you are not in C++, there may be a global lock going back to C++, forcing all threads with solutions to wait. Inside, there is also a mutex, but normally the locking part is very minimal.

In your graph, does the 3.45 minutes corresponds to finding the optimal solution? Sometimes, some threads are slow to finish. Laurent Perron | Operations Research | lperron@google.com | (33) 1 42 68 53 00

Le jeu. 19 sept. 2019 à 15:39, gregy4 notifications@github.com a écrit :

Definitely much better ... no interval where only one core is used. This time I choose to use 16 cores but only for short time 16cores (about 50%) were used - on graph between 2:45 and 3:45. After 3:45 cpu usage is between 10 - 16% so callback is probably not only the one reason. [image: cpu_profile] https://user-images.githubusercontent.com/5640112/65249124-b16b0600-daf3-11e9-890f-5004ab44aaed.png

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3LRLX32T5YTGBEENXTQKN6KTA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DP7QY#issuecomment-533135299, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3PPKQMRPAUXF3FNSHTQKN6KTANCNFSM4IYINESA .

gregy4 commented 5 years ago

No, optimal solution wasn't found and probably is not reachable in reasonable time.

lperron commented 5 years ago

It could be thermal throttling of the CPU.

Le jeu. 19 sept. 2019 à 16:23, gregy4 notifications@github.com a écrit :

No, optimal solution wasn't found and probably is not reachable in reasonable time.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1588?email_source=notifications&email_token=ACUPL3JHN3JAH7JK2H2MCPTQKODPXA5CNFSM4IYINESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DUU6Y#issuecomment-533154427, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUPL3IMB2HA3SZJRNLSTFDQKODPXANCNFSM4IYINESA .

gregy4 commented 5 years ago

I checked it on another problem, and cpu temperature was 5 degrees (67) above with constant 50% cpu utilization. For this problem temperature is at max around 63 degrees. I see memory bandwith as more probable reason, 40gb can be quite a lot for memory controller. Unfortunatelly for amd I'm not aware about software that can measure it.

timothylowjc commented 3 years ago

Hello!

I have roughly the same issue here, currently have my solvers running on AWS Batch using a c5.metal EC2 instance, which is 96 vCPUs and 192GB RAM.

Proto is a too large for github, but this should work

Anyway, image

I've currently set num_search_workers to 192, on the off chance I've misunderstood something about logical threads, but it appears that CPU utilization never breaks 30%.

Doubt its a temperature problem as mentioned, given vCPUs. CPU Usage ends in the graph after a certain amount of time has passed (400 seconds).

Any help is appreciated!

lperron commented 3 years ago

You are right. I could reproduce on my machine.

timothylowjc commented 3 years ago

Good to know its reproducible, thanks for the hard work! Looking forward to seeing the fix on master

lperron commented 3 years ago

I have done a first pass and committed to master. If possible, can you try ?

timothylowjc commented 3 years ago

Hi @lperron ,

Sorry about the late reply. I'll let you know sometime this week, we gotta figure out how to deploy master branches using AWS Batch first. Thanks for the fix!

timothylowjc commented 3 years ago

image

Alright, looks like the results speak for themselves.

Thanks for the fix @lperron !