Open ramyala opened 7 years ago
What logging level have you set? If hts_verbose is greater than 8, line 722 stops CURLOPT_FAILONERROR from being set. This stops curl from reporting the HTTP result code at the point where libcurl_seek() expects to find out about it.
We should probably make the overlap between enhanced logging and not setting CURLOPT_FAILONERROR a bit bigger, as it's quite easy to be caught out by this at the moment.
@daviesrob I made sure that wasn't the case by setting the log level to 8.
Another issue is for workloads where fetch() happens across compact BED file ranges, there is an onerous amount of new connections being established on each seek. It might actually be better to read from the connection instead of seeking to that offset if the old and new offsets differ by less than say 1MB wide? This is especially a problem for services like TCGA which rate limit or have problems handling large number of range queries at once. If you have better ideas around this happy to take that into consideration as I work around this issue.
If you leave the log level at the default (3, which is HTS_LOG_WARNING) does it work properly? And if not, can you get the same thing to happen when using htslib's test/test_view program (which allows you to specify a region)? Getting the python part of this out of the way would make for easier debugging.
Making libcurl_seek not try to do small seeks has crossed my mind, but I haven't got around to implementing it yet. Unfortunately even with this I suspect it's not going to behave very well. HTTP is designed for streaming, not random access, so it will never be a good fit.
The following python code would call into htslib seek() and hts_iter_next() for reads. When libcurl_seek fails the following code still passes and leads to erronous output. I've noticed this happen when connections fail with ssl issues or 503 Service Unavailable errors. Ideal behavior should be for libcurl_seek to return a failed code instead of success which would allow an exception to be triggered on pysam (so user can either re-establish connection or handle the issue gracefully).
Error 2: