signetlabdei / quic-ns-3

QUIC implementation for ns-3
GNU General Public License v2.0
43 stars 17 forks source link

SIGABRT "0RTT Handshake requested with wrong Initial Version" -- even when not enabling 0RTT explicitly. #15

Open tnull opened 3 years ago

tnull commented 3 years ago

Without enabling QUIC_VERSION_DRAFT_10, ns-3 crashes for me with

aborted. cond="!IsVersionSupported (m_vers)", msg="0RTT Handshake requested with wrong Initial Version", +4.713833601s 157 file=../../contrib/quic/model/quic-socket-base.cc, line=2550
libc++abi.dylib: terminating
Process 12686 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff693ef33a libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff693ef33a <+10>: jae    0x7fff693ef344            ; <+20>
    0x7fff693ef33c <+12>: movq   %rax, %rdi
    0x7fff693ef33f <+15>: jmp    0x7fff693e9629            ; cerror_nocancel
    0x7fff693ef344 <+20>: retq
Target 0: (bns) stopped.

This happens even if I do not explicitly enable the 0RTT handshakes. It also seems that this is not caused by an explicit call to Connect, but possibly due to a reconnect of an aborted connection?

Please find attached a full backtrace of the crash.

fedech commented 3 years ago

Hi,

could you attach your script, or a stripped-down version of it? How did you disable QUIC_VERSION_DRAFT_10?

tnull commented 3 years ago

I'm afraid that's not easily possible, since it's part of a larger simulation framework. Due to this fact, I also cannot pinpoint the error to a specific connect call, but it happens a some (seemingly random) point during the simulation. That said, if I'll observe something to further narrow down the general direction, try to produce a MWE.

I did not explicitly disable DRAFT 10, but this happens when I do not explicitly enable it via ns3::Config::SetDefault ("ns3::QuicSocketBase::InitialVersion", ns3::UintegerValue(QUIC_VERSION_DRAFT_10));

fedech commented 3 years ago

How are you creating the sockets? It appears that something is missing in your initialization, as the example scripts work without setting any defaults (the default for that attribute should be QUIC_VERSION_NEGOTIATION) if the sockets are created correctly. Are you using the helper?

tnull commented 3 years ago

The nodes are part of a topology of routers and leaf nodes. As discussed in https://github.com/signetlabdei/quic-ns-3/issues/12, I now install the QUIC stack when needed:

    if (enableQuic) { 
        ns3::QuicHelper stack;
        stack.InstallQuic(routerContainer);
        stack.InstallQuic(leafContainer);
    } else {
        ns3::InternetStackHelper stack;
        stack.Install(routerContainer);
        stack.Install(leafContainer);
    }

Listening sockets are created as

    if (!m_socket)
    {
        ns3::TypeId tid = ns3::TypeId::LookupByName ("ns3::QuicSocketFactory");
        m_socket = ns3::Socket::CreateSocket (GetNode (), tid);

        ns3::InetSocketAddress local = ns3::InetSocketAddress (m_address, PORT);

        m_socket->Bind (local);
        m_socket->Listen ();

        m_socket->SetRecvCallback (MakeCallback (&HandleRead, this));
    }

and sockets for outgoing connections are created as

    ns3::TypeId tid = ns3::TypeId::LookupByName ("ns3::QuicSocketFactory");
    ns3::Ptr<ns3::Socket> socketPtr = ns3::Socket::CreateSocket (GetNode (), tid);

    ns3::InetSocketAddress iAddr = ns3::InetSocketAddress(peerAddr, PORT);
    socketPtr->Connect (iAddr);
fedech commented 3 years ago

Ok, that part should be ok, you're doing it exactly like the quic-server.cc and quic-client.cc applications (in the quic-applications folder), but they don't have any issues in the example...

tnull commented 3 years ago

After enabling logging for the QuicSocketBase I get the following output right before the program crashes.

+8.233468295s 43 QuicSocketBase:ReTxTimeout(): [INFO ] ReTxTimeout Expired at time 8.23347
+8.233468295s 43 QuicSocketBase:ReTxTimeout(): [INFO ] TLP triggered
+8.233468295s 43 QuicSocketBase:BytesInFlight(): [INFO ] Returning calculated bytesInFlight: 0
+8.233468295s 43 QuicSocketBase:ConnectionWindow(): [INFO ] Returning calculated Connection: MaxData 131081 InFlight: 0
+8.233468295s 43 QuicSocketBase:OnSendingAckFrame(): [INFO ] Attach an ACK frame to the packet
+8.233468295s 43 QuicSocketBase:SendDataPacket(): [INFO ] SendDataPacket of size 13
+8.233517547s 44 QuicSocketBase:ReTxTimeout(): [INFO ] ReTxTimeout Expired at time 8.23352
+8.233517547s 44 QuicSocketBase:ReTxTimeout(): [INFO ] TLP triggered
+8.233517547s 44 QuicSocketBase:BytesInFlight(): [INFO ] Returning calculated bytesInFlight: 0
+8.233517547s 44 QuicSocketBase:ConnectionWindow(): [INFO ] Returning calculated Connection: MaxData 131081 InFlight: 0
+8.233517547s 44 QuicSocketBase:OnSendingAckFrame(): [INFO ] Attach an ACK frame to the packet
+8.233517547s 44 QuicSocketBase:SendDataPacket(): [INFO ] SendDataPacket of size 8
+8.233519850s 60 QuicSocketBase:Connect(): [INFO ] CONNECTION AUTHENTICATED Client found the Server 10.17.0.2 port 49159 in authenticated list
aborted. cond="!IsVersionSupported (m_vers)", msg="0RTT Handshake requested with wrong Initial Version", +8.233519850s 60 file=../contrib/quic/model/quic-socket-base.cc, line=2550
libc++abi.dylib: terminating

Notably, that happens also after setting the 0RTT-Handshake attribute to false, i.e., QUIC indeed tries to reconnect a node previously disconnected due to a retransmission timeout. To this end, I verified that Connect is only called once for the peer in question on my side.