Closed StephanHolgerD closed 1 year ago
As the User-Agent header suggests, pysam is simply using the wrapped htslib code to implement S3 access. That htslib code (in _hfilelibcurl.c) still implements seeks using CURLOPT_RESUME_FROM_LARGE
rather than CURLOPT_RANGE
, which would be better placed to Include an ending offset.
Please report this issue to htslib directly.
ok, so it makes sense that pysam.view creates clean range requests ?
I thought pysam uses the same lib like samtools (htslib)
Pysam (both fetch and view) uses the same library as samtools. This is why you should report this issue to the library where the problem is, namely htslib.
ok because pysam.view has a clean range request, that's why I was wondering
Please report this problem to htslib.
Hi, I want to report a potentially problematic behaviour using pysam.fetch on AWS S3 bucket infrastructure. Using the following pseudo code on a Bam file in a S3 Bucket will create requests without a defined end range.
Code
with pysam.AlignmentFile(bamfile_S3,filepath_index=baifile_S3) as f:
for r in f.fetch(chrom,start,end):
Request
This kind of 'open' request results in high egress costs because aws logs the whole file after the start byte as delivered, even if you stop reading the data at the end of your fetch coordinates.
Compared to the requests from IGV on S3 data (low egress costs, only the exact byte range is logged)
Request