Closed keiranmraine closed 6 years ago
On Wed, Oct 10, 2018 at 12:15:05PM +0000, Keiran Raine wrote:
I'm finding that using REF_PATH isn't working as expected.
I think the problem is it is not honouring 301 redirect codes. Perhaps this can be enabled in the code:
https://curl.haxx.se/libcurl/c/CURLOPT_FOLLOWLOCATION.html
$ REF_PATH='URL=http:://https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ebi.ac.uk_ena_cram_md5_-25s&d=DwICaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=wodoR_G062E4YLZ-xu5t6g&m=-Ub6gf1h1Ts4cYDiDHrHoDQfWphQW7lS728AtZKJAz8&s=-y-F2smd1D4fhfUqRkPar_Y__e8ZH7DvKtL6ZnSQhRI&e=' REF_CACHE=$PWD/wibble/to_split/hts-ref-cache/%2s/%2s/%s bamtofastq gz=1 exclude=SECONDARY,SUPPLEMENTARY tryoq=1 outputperreadgroup=1 outputperreadgroupprefix=colo-829 outputperreadgroupsuffixF=_1.fq.gz outputperreadgroupsuffixF2=_2.fq.gz outputperreadgroupsuffixO=_o1.fq.gz outputperreadgroupsuffixO2=_o2.fq.gz outputperreadgroupsuffixS=_s.fq.gz inputformat=cram filename=wibble/to_split/colo-829.cram outputdir=wibble/
I'm confused why you have http:://https:// in here? (And blergh! ^&%!-off proofpoint I want to see the proper URL again).
Io_lib now should accept single colon URL=http://www.ebi.ac.uk/(etc). Although colon is the path separator, it specifically checks for preceding http, and ftp. Hopefully similarly with port numbers. However looking at the code I see I forgot to add https! Sigh. (It's in the htslib copy.)
Note you can use '|' before a search name to avoid the repeated lookups with different file extensions. Eg:
REF_PATH='|URL=https:://www.ebi.ac.uk/ena/cram/md5/%s'
Try explicitly using https instead of http to avoid the redirect.
... for bz2, sz, Z, bz2 (oddly a second time) ...
Good spot. I think it's something to do with the magic number detection order too, but it's been year ssince I did that code. This is actually meant for finding compressed sequence chromatograms. :-)
-- James Bonfield (jkb@sanger.ac.uk) The Sanger Institute, Hinxton, Cambs, CB10 1SA
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Thanks James, that makes total sense. I'd have spotted that if I hadn't copied the complete URL from the forwarding 'Location' in the error when testing with curl
.
WRT ::
in urls, I was following a subsection of the biobambam2 readme which indicates double colon. I'd tried without and with. Now I'm using https I can see that I have to use ::
otherwise I get:
CURL ERROR: Couldn't resolve host 'https'
Sadly using https I now get:
$ REF_PATH='URL=https:://www.ebi.ac.uk/ena/cram/md5/%s' ...
CURL ERROR: server certificate verification failed. CAfile: none CRLfile: none
...
Running a direct curl
, I see these getting set:
$ curl -vi https://www.ebi.ac.uk/ena/cram/md5/1b22b98cdeb4a9304cb5d48026a85128 > /dev/null
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
Is this something you would expect?
(I hate proof point too, especially when it fails to reconstruct the URL)
There were a number of problems to fix.
Double colon was needed for https, but not http and ftp. I simply hadn't updated the exceptions.
It didn't honour redirect (301). It now does.
https requires a user-agent, or at least our proxy does.
This seems to work in io_lib itself now. I tested it (with and without proxy settings) via:
http_proxy= https_proxy= REF_PATH='|https://www.ebi.ac.uk/ena/cram/md5/%s' REF_CACHE=/ scramble -H ~/scratch/data/9827_2#49.1m.cram
Hi,
I'm indirectly using io_lib via biobambam2 so this is primarily an attempt to isolate where the problem may lie. My understanding is that CRAM conversion is handled by
io_lib
.I'm finding that using REF_PATH isn't working as expected.
When I run
bamtofastq
(as we need to split by readgroup too) the job fails:If I first run
samtools view
, thus pre-populating theREF_CACHE
thenbamtofastq
successfully completes:Any thoughts?
biobambam2 2.0.86, not sure which io_lib is compiled into the bundle.