microsoft / GraphEngine

Microsoft Graph Engine
MIT License
2.2k stars 328 forks source link

[Error] Error in `dotnet': double free or corruption (out) ,V2.0.9328(latest) ,Ubuntu16.04x64,.netcore2.1.4 #220

Closed BenDerPan closed 6 years ago

BenDerPan commented 6 years ago

My Test Env:

GraphEngine.Core: V2.0.9328(latest) , I rebuild this morning.

OS: Ubuntu16.04 x64

.Net Core: V2.1.4

My code run well on windows, the exception occurs when I run my same code on Linux. and I found that when the storage contains data and query data with syn protocol this will happen.

My server side output

image

My client side only throw exception: System.IO.IOException "Network error occurs."

BenDerPan commented 6 years ago

More Info: My storage data copy from windows, and it loaded success. Empty storage will not throw exception.

BenDerPan commented 6 years ago

Client side got no response here,check protocol signatures is a new feature?

[ INFO    ] *****************************************************
[ INFO    ] ServerCount: 1
[ INFO    ]     192.168.102.160:5304
[ INFO    ] ProxyCount: 0
[ INFO    ] *****************************************************
[ INFO    ] Checking Client-Server protocol signatures...

Server side output contains strange ip, and http for likq not startup...

[ DEBUG   ] Preserved sync (rsp) message GetCellType is registered.
[ DEBUG   ] Preserved sync (rsp) message QueryMemoryWorkingSet is registered.
[ DEBUG   ] Preserved async message Shutdown is registered.
[ DEBUG   ] Sync (rsp) message 0 is registered.
[ DEBUG   ] Sync (rsp) message 1 is registered.
[ DEBUG   ] Sync (rsp) message 2 is registered.
[ INFO    ] Listening endpoint :5304
[ INFO    ] Waiting for client connection ...
[ INFO    ] My IPEndPoint: 127.0.1.1:5304
[ INFO    ] *****************************************************
[ INFO    ] ServerCount: 1
[ INFO    ]     192.168.102.160:5304
[ INFO    ] ProxyCount: 0
[ INFO    ] *****************************************************
[ DEBUG   ] ServerSocket: Incomming connection from 215.58.192.168
[ DEBUG   ] ServerSocket: Incomming connection from 215.60.192.168
[ INFO    ] Checking Server-Server protocol signatures...
[ DEBUG   ] ServerSocket: Incomming connection from 215.68.192.168
[ DEBUG   ] ServerSocket: Incomming connection from 215.70.192.168
yatli commented 6 years ago

uh oh, looks like the networking subsystem crashed on connection. investigating.

yatli commented 6 years ago

@BenDerPan hey could you try the eventloop branch? The Linux networking is improved.

yatli commented 6 years ago

I’ve also noticed the weird addresses reported by the server on client connection — these connections should be all coming from localhost, but appear to be random in the log.

BenDerPan commented 6 years ago

@yatli great , I will try it later, not on my PC now. :)

BenDerPan commented 6 years ago

@yatli the address report is still strange, but there is a new problem, my env : Server on ubuntu 16.04x64, Client on windows 10, and GraphEngine 2.0.9542.

The client side sometimes will dead with no response when I save someting, but sometime it's ok. I think it is still the problem of network

yatli commented 6 years ago

@BenDerPan thanks for testing! how about the double free corruption?

BenDerPan commented 6 years ago

@yatli there is no exception on Linux server side, so the double free corruption seems fixed. but I am not sure.

yatli commented 6 years ago

attempting a minimal repro.

yatli commented 6 years ago

minimal repro failed.

Client side Windows 10 x64 eventloop HEAD:

using System;
using Trinity;
using Trinity.Storage;

namespace test_ge
{
    class Program
    {
        static void Main(string[] args)
        {
            TrinityConfig.LoadConfig("trinity.xml");
            TrinityConfig.CurrentRunningMode = RunningMode.Client;
            Global.CloudStorage.LoadCell(0, out var cell, out _);
            Console.WriteLine(cell.Length);
        }
    }
}

Server side Linux, eventloop HEAD:

using System;
using Trinity;
using Trinity.Storage;
using Trinity.Network;

namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            Global.LocalStorage.SaveCell(0, new byte[128]);
            Global.LocalStorage.SaveStorage();

            TrinityServer server = new TrinityServer();
            server.Start();
            Console.Write("Press any key to stop...");
            Console.ReadKey();
        }
    }
}

windows client correctly outputs 128 and exits. @BenDerPan custom syn protocol?

BenDerPan commented 6 years ago

@yatli yes, I used custom syn protocol

yatli commented 6 years ago

@BenDerPan repro failed.

Client:

using System;
using Trinity;
using Trinity.Storage;
using test_ge.S;

namespace test_ge
{
    class Program
    {
        static void Main(string[] args)
        {
            TrinityConfig.LoadConfig("trinity.xml");
            TrinityConfig.CurrentRunningMode = RunningMode.Client;
            Global.CloudStorage.LoadCell(0, out var cell, out _);
            Console.WriteLine(cell.Length);

            using(var rsp = Global.CloudStorage[0].P())
            {
                Console.WriteLine(rsp);
            }
        }
    }
}

server (Linux):

using System;
using Trinity;
using Trinity.Storage;
using Trinity.Network;

namespace test
{
    class Server: SBase
    {
        public override void PHandler(PayloadWriter rsp)
        {
            rsp.foo = 123;
            rsp.bar = "bar";
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hello World!");
            Global.LocalStorage.SaveCell(0, new byte[128]);
            Global.LocalStorage.SaveStorage();

            Server server = new Server();
            server.Start();
            Console.Write("Press any key to stop...");
            Console.ReadKey();
        }
    }
}

TSL:

struct Payload
{
    int foo;
    string bar;
}

protocol P
{
    Type: Syn;
    Request: void;
    Response: Payload;
}

server S
{
    protocol P;
}

The client correctly outputs the response.

BenDerPan commented 6 years ago

@yatli I am trying to clean all the cache ,and rebuild for a test with my project

BenDerPan commented 6 years ago

@yatli Still the same: dead again, no response, as pic:

image

yatli commented 6 years ago

@BenDerPan you mean crash, exception or freeze?

BenDerPan commented 6 years ago

@yatli I mean freeze, the code stopped at that line ,but there is no crash or exception

yatli commented 6 years ago

Understood. From your screenshot I see that I should try larger payloads.

Attempting a repro.

yatli commented 6 years ago

@BenDerPan do you observe the same symptom if running a windows server program?

BenDerPan commented 6 years ago

@yatli I am trying now

BenDerPan commented 6 years ago

@yatli not freeze, but there is exception: image

 在 Trinity.Storage.RemoteStorage._error_check(TrinityErrorCode err)
   在 Trinity.Storage.RemoteStorage._use_synclient(Func`2 func)
   在 Trinity.Storage.RemoteStorage.SendMessage(Byte* message, Int32 size, TrinityResponse& response)
   在 Trinity.Storage.MessagePassingExtensionMethods.GetCommunicationSchema(IMessagePassingEndpoint storage, String& name, String& signature)
   在 Trinity.Storage.MemoryCloud.CheckProtocolSignatures_impl(RemoteStorage storage, RunningMode from, RunningMode to)
   在 Trinity.Storage.FixedMemoryCloud.Open(ClusterConfig config, Boolean nonblocking)
   在 Trinity.Global.get_CloudStorage()
   在 
yatli commented 6 years ago

two possible cases:

  1. the remote handler did throw an exception
  2. the remote handler wasn't called at all. instead the default handler (which always throws an exception) was called.
yatli commented 6 years ago

could you come up with a minimal repro? I can then proceed to debug it.

BenDerPan commented 6 years ago

@yatli sorry I can't , it's part of our big system, pick it out is a terrible work :(

BenDerPan commented 6 years ago

@yatli I tested run the server on windows ,and build simple query client on both Ubuntu and windows, everything is ok , now I guess it's the version problem, server use eventloop version, and my exception client use master version.

yatli commented 6 years ago

@BenDerPan you mean you rebuild everything and the problem is gone?

BenDerPan commented 6 years ago

@yatli yes, I can't reproduce the problem now, but I don't know why , maybe because I rebooted my PC.

yatli commented 6 years ago

alright. let's close this issue for now (as the original double-free corruption is known to be resolved). you're welcome to open up new issues following up this topic.

thanks again!