Open blajoie opened 7 years ago
You can try strace -f -tT samtools view ...
to show the time taken for each system call. That may help to show where any delays are happening.
I'm not aware of any timeouts in the code, but it has many layers involved so I may have missed something.
I just did this myself and see ~2sec in select timeouts.
$ strace -o tr -ftT samtools view -H "https://s3-us-west-1.amazonaws.com/ilmn.htslib-test/test.bam" > /dev/null
$ fgrep 'select(4, [3]' tr
29431 15:21:48 select(4, [3], [], [], {0, 1000}) = 0 (Timeout) <0.001089>
29431 15:21:48 select(4, [3], [], [], {149, 991000}) = 1 (in [3], left {149, 843643}) <0.147418>
29431 15:21:48 select(4, [3], [], [], {149, 840000}) = 1 (in [3], left {149, 693125}) <0.146930>
29431 15:21:48 select(4, [3], [], [], {149, 686000}) = 1 (in [3], left {149, 538901}) <0.147188>
29431 15:21:48 select(4, [3], [], [], {149, 536000}) = 1 (in [3], left {149, 378097}) <0.157968>
29431 15:21:48 select(4, [3], [], [], {149, 377000}) = 1 (in [3], left {149, 376996}) <0.000107>
29431 15:21:48 select(4, [3], [], [], {149, 376000}) = 1 (in [3], left {149, 231022}) <0.145041>
29431 15:21:48 select(4, [3], [], [], {149, 230000}) = 1 (in [3], left {149, 229996}) <0.000121>
29431 15:21:48 select(4, [3], [], [], {149, 229000}) = 1 (in [3], left {149, 228996}) <0.000024>
29431 15:21:48 select(4, [3], [], [], {149, 228000}) = 1 (in [3], left {149, 83445}) <0.144622>
29431 15:21:49 select(4, [3], [], [], {0, 1000}) = 0 (Timeout) <0.001076>
29431 15:21:49 select(4, [3], [], [], {149, 998000}) = 1 (in [3], left {149, 850148}) <0.147911>
29431 15:21:49 select(4, [3], [], [], {149, 848000}) = 1 (in [3], left {149, 700669}) <0.147389>
29431 15:21:49 select(4, [3], [], [], {149, 695000}) = 1 (in [3], left {149, 546471}) <0.148588>
29431 15:21:49 select(4, [3], [], [], {149, 544000}) = 1 (in [3], left {149, 372074}) <0.171989>
29431 15:21:49 select(4, [3], [], [], {149, 370000}) = 1 (in [3], left {149, 369996}) <0.000110>
29431 15:21:49 select(4, [3], [], [], {0, 1000}) = 0 (Timeout) <0.001190>
29431 15:21:49 select(4, [3], [], [], {10, 0}) = 1 (in [3], left {9, 847421}) <0.152651>
29431 15:21:50 select(4, [3], [], [], {10, 0}) = 1 (in [3], left {9, 999996}) <0.000099>
29431 15:21:50 select(4, [3], [], [], {10, 0}) = 1 (in [3], left {9, 854190}) <0.145862>
29431 15:21:50 select(4, [3], [], [], {10, 0}) = 1 (in [3], left {9, 999997}) <0.000072>
29431 15:21:50 select(4, [3], [], [], {10, 0}) = 1 (in [3], left {9, 999998}) <0.000011>
29431 15:21:50 select(4, [3], [], [], {10, 0}) = 1 (in [3], left {9, 853681}) <0.146383>
This isn't actually using the s3 interface, but https instead - hence libcurl. There are timeouts mentioned in the htslib curl interface, eg in wait_perform
, but they're all powers of 10 milliseconds and not the ~145ms we see here. I expect this is just some round-trip time to AWS and the aggregate delay comes through many round-trips. I'm not familier with the https protocols to understand why so many round-trips are used.
I see htslib is doing sendto "CONNECT" twice, presumably once for file format detection and once to download, but that doesn't explain the large number of subsequent select calls. Maybe it doesn't have a large enough buffer size. @jmarshall Do you remember what's going on here?
Hi,
We've noticed that samtools/htslib seems to have a small time penalty when making an initial connection to any s3 resource (local s3 or amazonaws s3).
e.g. 1-2 second delay samtools view "https://s3-us-west-1.amazonaws.com/ilmn.htslib-test/test.bam samtools view "https://s3-us-west-1.amazonaws.com/ilmn.htslib-test/test.bam" chr3:49000000-50000000
However a direct curl seems to begin streaming in < 1 second curl -XGET "https://s3-us-west-1.amazonaws.com/ilmn.htslib-test/test.bam" | samtools view
This time delay does not appear to be due to the initial download of the BAI, as it still happens after the index is cached, nor does it appear to be the result of the s3 auth (above is a public object)
Are there any sort of wait/timeout(s) happening during the initial call? Or something with libcurl?
Access times for local BAMs are close to instant (as expected). We (@nhjohnson) expect a small network related time delay, but is it possible something else is causing the lag?
Thanks !