Open ccaffy opened 1 year ago
This may be another case in which libcurl is to blame. The same command works fine on my machine, where I have two servers on different ports setup for TPC transfers:
$ curl -v --capath /etc/grid-security/certificates -L -X COPY -H "Source: $DST" -H "SciTag: 144" -H 'X-Number-Of-Streams: 3' "$SRC"
* Trying [2001:1458:202:228::100:82]:8081...
* Connected to gentoo.cern.ch (2001:1458:202:228::100:82) port 8081
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/grid-security/certificates
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
* subject: DC=ch; DC=cern; OU=computers; CN=gentoo.cern.ch
* start date: Jun 6 14:47:51 2023 GMT
* expire date: Jul 10 14:47:51 2024 GMT
* subjectAltName: host "gentoo.cern.ch" matched cert's "gentoo.cern.ch"
* issuer: DC=ch; DC=cern; CN=CERN Grid Certification Authority
* SSL certificate verify ok.
* using HTTP/1.x
> COPY //file.raw HTTP/1.1
> Host: gentoo.cern.ch:8081
> User-Agent: curl/8.4.0
> Accept: */*
> Source: https://gentoo.cern.ch:8082//file.raw
> SciTag: 144
> X-Number-Of-Streams: 3
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/1.1 201 Created
< Connection: Keep-Alive
< Server: XrootD/v5.6.3-88-g9d2627fda
< Content-Type: text/plain
< Transfer-Encoding: chunked
<
Perf Marker
Timestamp: 1701859536
Stripe Index: 0
Stripe Bytes Transferred: 805306368
Total Stripe Count: 1
RemoteConnections: tcp:[2001:1458:202:228::100:82]:8082
End
* Connection #0 to host gentoo.cern.ch left intact
success: Created
I've activated more logs and here's what I see:
240517 17:44:01 28179 TPC_CanStartTransfer: Unable to start transfers as no buffers are available. Available buffers:
0, Active curl handles: 0, Transfers in progress: 0
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Beginning dump of stream buffers.
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Stream offset: 0
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Buffer 0: Offset=117440512, Size=16777216, Capacity=16777216
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Buffer 1: Offset=16777216, Size=16777216, Capacity=16777216
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Buffer 2: Offset=67108864, Size=16777216, Capacity=16777216
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Buffer 3: Offset=100663296, Size=16777216, Capacity=16777216
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Buffer 4: Offset=134217728, Size=16777216, Capacity=16777216
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Buffer 5: Offset=150994944, Size=16777216, Capacity=16777216
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Buffer 6: Offset=33554432, Size=16777216, Capacity=16777216
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Buffer 7: Offset=50331648, Size=16777216, Capacity=16777216
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Buffer 8: Offset=83886080, Size=16777216, Capacity=16777216
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Buffer 9: Offset=0, Size=16777216, Capacity=16777216
240517 17:44:01 28179 TPC_Stream::DumpBuffers: Finish dump of stream buffers.
240517 17:44:01 28179 TPC_StartTransfers: Unable to start transfers.
240517 17:44:01 28179 TPC_PullRequest: event=MULTISTREAM_IDLE, local=/tmp/BIGFILE_5GB, remote=https://xrootd-ccaffy-dev01.cern.ch:2001/tmp/BIGFILE_5GB_COPY, user=(anonymous), streams=10, bytes_transferred=167772160; No handles are able to run. Streams=10, concurrency=10
240517 17:44:01 28179 TPC_PullRequest: event=MULTISTREAM_FAIL, local=/tmp/BIGFILE_5GB, remote=https://xrootd-ccaffy-dev01.cern.ch:2001/tmp/BIGFILE_5GB_COPY, user=(anonymous), streams=10, bytes_transferred=167772160, tpc_status=0; failure: Internal logic error led to early abort; current offset is 167772160 while full size is 467664896
We can see that all in-memory buffers are full and none of them were flushed to the disk to liberate them in order to allow new streams to come in... The reason for that is unknown to me. This error happens randomly while trying to run a TPC transfer. So there might be a bug somewhere in the way we manage the buffers internally...
While implementing the packet marking for HTTP TPC transfers, I saw that there is an issue with multistream HTTP TPC PULL transfers.
The request:
On the client-side:
The logs on the pulling machine: