[Moco] Optimization results vary across Windows systems

tymo77 commented 3 years ago

I have access to several Windows systems with different system architectures. I wanted to increase my throughput by running optimizations across them, but first I had to be sure they would give the same result for the same problem. The only trouble is that the four different systems are consistently inconsistent in their results -- even for relatively simple example problems.

Computers:

Computer 1 ("Lab"): Any one of a number of available computers.

Processor: Intel Xeon E5-1620 @ 3.60GHz
Memory: 16GB
Windows Version: Windows 10 Enterprise 10.0.18363 Build 18363

Computer 2 ("Laptop"):

Processor: Intel Core i7-7700HQ @ 2.80GHz
Memory: 16GB
Windows Windows 10 Home Version: 10.0.19042 Build 19042

Computer 3 ("Workstation"):

Processor: Intel Xeon E3-1245 v3 @ 3.40GHz
Memory: 32GB
Windows Version: Windows 10 Enterprise 10.0.17134 Build 17134

Computer 4 ("Server"): An old Windows Server system with four CPUs and a past-end-of-life OS.

Processor: Four Intel Xeon CPU E7- 4870 @ 2.40GHz
Memory: 112GB
Windows Version: Windows Server 2008 R2 Enterprise 6.1.7601 Service Pack 1 Build 7601

Example 1: exampleSquatToStand.m

Running this example on the different machines gives an array of different answers depending on the machine.

Part 1: Computer 1: Total Iterations: 62. Final Objective Value: 2.2933840e-01 Computer 2: Total Iterations: 71. Final Objective Value: 2.2929431e-01 Computer 3: Total Iterations: 71. Final Objective Value: 2.2929431e-01 Computer 4: Total Iterations: 96. Final Objective Value: 2.2922824e-01

These results from Part 1 of the example file seem to show that the results are similar to 3 significant digits. Computers 2 and 3 are in agreement while 1 and 4 are each distinct from the rest.

Final Answer: Computer 1: Cost without device: 0.657197 Computer 2: Cost without device: 0.663118 Computer 3: Cost without device: 0.663118 Computer 4: Cost without device: 0.668557

The final answer results for "cost without device" (as well as cost with device) follow the same pattern as before. Computers 2 and 3 agree while 1 and 4 are distinct. In order to make sure the cause is not something specific to this problem, we can look at a different example.

Example 2: example2Dwalking.m

Running this example also gives an array of different examples, and it doesn't matter whether the parallel option is set to "0" for off or "1" for default.

This figure shows how the objective varies with each iteration of the first step of the example2Dwalking.m problem. This is the tracking problem step. The workstation diverges wildly from the other machines at first.

Interestingly, we can rule out an effect of the parallel setting and the number of threads used to solve the problem. Turning parallel on or off makes no difference between each machine. The results are identical in each case.

At iteration 60 the objective value for each of the following is... Computer 1: 1880.2398 Computer 2: 1808.7519 Computer 3: 1808.7519 Computer 4: 1923.2458

A few other notes:

Despite the fact that these examples are all from the MATLAB examples, based on other problems I have tried, I know that the same problem reoccurs when compiling the 0.4.0 C++ versions of the examples for Windows and distributing the binaries to the different machines.

Results are always consistent within each machine. To the best of my knowledge, I have never gotten different results from day to day on the same machine, and the number of threads used is never a factor.

All of these machines are using the 0.4.0 release of Moco. On each machine, the Moco 0.4.0 binaries are the only OpenSim binaries in the system/user path environment variable, and the ConfigureMoco script has been run on each to setup MATLAB.

Conclusions

The results of certain Moco optimizations using CasAdi + Ipopt give results that differ between some machines.
The results are unique to some machines, but other results are consistent between dissimilar machines.

I am kind of at a loss for what might cause this behavior. The fact that the differences are so consistent means it has to do with something specific to each system but all of them are running the same version of MoCo. I wouldn't have been so surprised if only Computer 4 was inconsistent as it is the weirdest of the bunch by far. Whatever the case, this behavior is a bit of a challenge for me. For my use case, I have found that these difference sometimes end up much larger as different machines get stuck at different local optima for the same problem with differences in the objective function many orders of magnitude larger than the convergence tolerance.

nickbianco commented 3 years ago

Hi @tymo77, thank you for the thorough analysis and explanation. I'm also unsure why this is happening but we should definitely get to the bottom of it. A few quick questions:

Have you tried solving either of these problems with MocoTropterSolver? It would be helpful to know if the problems behave the same way between both solvers, as that would narrow down the issue to either OpenSim or Ipopt.
Would you be willing to test this analysis on the most recent Moco release that comes with OpenSim 4.2? I don't necessarily expect that the results will be different but it would be good to rule out any issues with old Moco versions.

tymo77 commented 3 years ago

I am going to focus on the first example: exampleSquatToStand.m in this follow-up since it is a bit faster to run.

1. MocoTropterSolver

To test MocoTropter, I simply changed the solver for the prediction study to a Tropter solver -- one line. Here are the solution results with Tropter: Computer 1: 95 Iterations, 2.292402e-01 Computer 2: 63 Iterations, 2.293133e-01 Computer 3: 63 Iterations, 2.293133e-01 Computer 4: 97 Iterations, 2.292426e-01

2. OpenSim 4.2 (Using CasAdiSolver)

I downloaded the source code, built the dependencies, and then built OpenSim. Computer 1: 62 Iterations, 2.2933840e-01 Computer 2: 71 Iterations, 2.2929431e-01 Computer 3: 71 Iterations, 2.2929431e-01 Computer 4: 96 Iterations, 2.2922824e-01

So as you can see, there is no change in behavior between the releases of OpenSim/Moco. There is also no change to the general pattern when using Tropter. The computers each give a unique result except for the pair 2 and 3 which perform identically.

I wonder what would be best to test next?

I also wonder if there are others that could run the same tests and report their results?

nickbianco commented 3 years ago

Thanks for the update and ruling out those possibilities. This might be a tough problem to debug and there could be any number of reasons why these problems behave slightly differently (e.g., different versions of linear algebra packages or other low-level computations). I'm honestly not sure why these differences occur, but I'm also not too surprised either.

Are these differences significantly impacting your current work? Using a Docker container could be a way to ensure consistent behavior across different PCs.

tymo77 commented 3 years ago

The main impact on my work is just that I lose confidence in a given result, and feel like I have to test it on multiple machines, or with several different solver settings or initial guesses. I had hoped to be able to use all of these machines to increase my total simulation throughput, but if they each perform differently, I am deterred from that.

While the differences between final objective values are relatively small in these examples, I have had some more difficult custom problems with final results differing on the order of 10%.

Docker seems like a good place to test some of these differences. I haven't used it before, so I'd have to come up to speed on it first.

I too am suspicious that it has to do with very low level packages. However, I am most worried that there are actually multiple overlapping, simultaneous causes which would make it even harder to isolate.

Overall, I am not sure how high of a priority it is resolving this, but from a replicability standpoint, it could definitely become an issue if a reviewer can't get the same result as you and neither of you can figure out why.

nickbianco commented 3 years ago

The main impact on my work is just that I lose confidence in a given result

That's fair, and I would feel the same way. I work between a few different machines and haven't noticed any noticeable differences, but haven't done exact numerical comparisons like these.

While the differences between final objective values are relatively small in these examples, I have had some more difficult custom problems with final results differing on the order of 10%.

Do you have a sense if convergence/constraint tolerances playing a role here? I could see different machines causing the same problem to fall into different local minimums too.

tymo77 commented 3 years ago

My sense is that a stricter dual feasibility tolerance would help, and that they are mostly converging to the same optima, but I can't know for sure until I test that by letting them run for a long time to the stricter tolerance.

I have started to try and setup Docker, but it will probably take a while for me to come up to speed. That seems like it definitely ought be the best way to test for the cause of the differences in a tightly controlled environment.

opensim-org / opensim-core