google / or-tools

Google's Operations Research tools:
https://developers.google.com/optimization/
Apache License 2.0
10.6k stars 2.07k forks source link

Upgrade 9.9.2963 -> 9.10.4067 causes server silent reboot #4224

Open apavelm opened 3 weeks ago

apavelm commented 3 weeks ago

Version: 9.9.2963 -> 9.104067 Language: C#, .NET 8, AspNet Core latest stable

CP-SAT Windows (on Azure App Services)

After upgrading from 9.9. to 9.10. AspNet Core Web Application Service reboots without any error message or exception. We accidentally upgraded library, considering that minor version upgrade should not affect anything. We were wrong. Downgrade make it work again.

lperron commented 3 weeks ago

can you send me the model that triggers the error. Windows silently crashes on floating point errors for instance.

apavelm commented 3 weeks ago

The problem, I'm not certain of the place where it crashes. It works on localhost, but crashes on Azure App Service (previous version has been working fine). But I'm confident it happens here:

CpModel model = new CpModel();
var allBookings = inputData.Items.OrderBy(x => x.Start).ToArray();
var numBookings = allBookings.Length;
var allTasks = new Dictionary<int, IntVar>();
foreach (var booking in allBookings)
{
    allTasks[booking.Id] = booking.IsFixed ?
        model.NewConstant(booking.Spot.Value) :
        model.NewIntVar(0, numSpots - 1, $"{booking.Id}_spot");

    if (booking.Spot.HasValue && !booking.IsFixed)
    {
        model.AddHint(allTasks[booking.Id], booking.Spot.Value);
    }
}

for (int i = 0; i < numBookings; i++)
{
    for (int j = i; j < numBookings; j++)
    {
        if (i == j) continue;

        var booking = allBookings[i];
        var otherBooking = allBookings[j];
        var distance = otherBooking.Start - booking.Start - booking.Duration;

        var bookingLeft = allTasks[booking.Id];
        var bookingRight = allTasks[otherBooking.Id];

        var reqDistance = booking.UseGaps && otherBooking.UseGaps ? gapSize : 0;

        if (distance < reqDistance)
        {
            ILiteral couldBeInChainAtTheSameSpot = model.FalseLiteral();
            model.Add(bookingLeft != bookingRight).OnlyEnforceIf(couldBeInChainAtTheSameSpot.Not());
        }
    }
}

CpSolver solver = new CpSolver
{
    StringParameters = "linearization_level:1 num_workers:4"
};

CpSolverStatus status = solver.Solve(model);

AllBookings could contain only 1 record, and it will crash on 9.10. Not likely that the problem is in the model.

lperron commented 3 weeks ago

can you check protobuf was correctly updated ?

apavelm commented 3 weeks ago

No doubts, 3.26.1 I have the same versions on localhost and in Azure App Service. On localhost no problems, in a cloud silently crashes.

apavelm commented 3 weeks ago

Microsoft Azure Support shared the stack trace:


Your app crashed because of System.ExecutionEngineExceptionYour app and aborted the requests it was processing when the overflow occurred. As a result, your app’s users may have experienced HTTP 502 errors.

This call stack caused the exception:
InlinedCallFrame
InlinedCallFrame
ILStubClass.IL_STUB_PInvokeGoogle.OrTools.Sat.SolveWrapper.Solve
Google.OrTools.Sat.CpSolver.Solve
<Next goes app service Solve method from the appliation>

....
lperron commented 3 weeks ago

Still, it works locally. So the issue is a configuration issue.

apavelm commented 3 weeks ago

All the configuration above and still works with previous versions down to 9.6 (we started from it) Everywhere is Windows x64 platform. But I'm not sure about Windows version on Azure VM (App Service), guessting at least win2019 or even 2022

apavelm commented 3 weeks ago

Probably I know the reason why it works on localhost. On localhost I run it always in debugger. And InlinedCallFrame never appears in debug mode as far I remember.

lperron commented 3 weeks ago

I cannot do anything until you send me something I can reproduce.

apavelm commented 3 weeks ago

Sorry, I don't have anythig else. I already sent everything I have. The fact it happenes only in Cloud makes the task harder. What is possible to do - is to compare sources of CpModel and CpSolver between abovementioned versions. Maybe something new and/or suspicious could be found in DIFF, because on the same environment all previous versions since 9.6...9.9 are working fine.

lperron commented 3 weeks ago

see the other issues I just closed, it was a missing updated visual studio version.

apavelm commented 3 weeks ago

We are using Azure Pipelines to build the artifact. Azure Agent named "windows-latest", according to the documentation doc it contains windows-2022 and visual studio 2022 (version: 17.9.34728.123). On localhost I have 17.8... (maybe this is the reason. I'll try to update) The full list of installed software on the agent is here.

lperron commented 3 weeks ago

Unless you can reproduce on a vanilla windows machine, I cannot do anything.

I do not have access to an azure server.

Le ven. 10 mai 2024, 09:59, Pavel Andreev @.***> a écrit :

We are using Azure Pipelines to build the artifact. Azure Agent named "windows-latest", according to the documentation doc https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml it contains windows-2022 and visual studio 2022 (version: 17.9.34728.123). On localhost I have 17.8... The full list of installed software on the agent is here https://github.com/actions/runner-images/blob/main/images/windows/Windows2022-Readme.md .

— Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/4224#issuecomment-2104128708, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUPL3IEF5FBLBGKGW32OUDZBR44RAVCNFSM6AAAAABHPLGJLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBUGEZDQNZQHA . You are receiving this because you commented.Message ID: @.***>

leoduret commented 3 weeks ago

We are using Azure Pipelines to build the artifact. Azure Agent named "windows-latest", according to the documentation doc it contains windows-2022 and visual studio 2022 (version: 17.9.34728.123). On localhost I have 17.8... (maybe this is the reason. I'll try to update) The full list of installed software on the agent is here.

I have the exact same issue with python ortools==9.10 CP-SAT on an azure pipelines runner with windows-latest (works on ubuntu-latest). Can't reproduce locally either with Visual Studio 16. Fix is also to downgrade to 9.9

lperron commented 3 weeks ago

https://developercommunity.visualstudio.com/t/Problems-with-publishing-after-upgrading/10587098?sort=active&topics=windows+10.0

https://developercommunity.visualstudio.com/t/Rollback-from-1790-to-1787/10600701?q=%5BFixed+In%3A+Visual+Studio+2022+version+17.9.2%5D

apavelm commented 2 weeks ago

I did many tests, please note:

I built (win-x64) in azure pipeline and deployed to Azure Function / Azure App Service - same result: AccessViolationException. then I built the same version (9.10) using the same toolset and deployed to linux version of azure function / azure app service - both works.

Apparently, the issue is in win-x64 platform binaries.

Mizux commented 1 week ago

my 2 cents: I've used VS 2022 Preview (up to date?) to built the binaries since few months ago VS 2022 was crashing in macro/template parsing (IIRC).

if VS 2022 Preview is shipped with an "advanced" redistributable VS runtime it main explain why azure base image can't load them...

I'll try to see If I can use a regular VS 2022 Community install to build (i.e. removed the preview from my Windows VM).