Closed tarrade closed 5 years ago
@rsepassi @Conchylicultor @cyfra Is this issue fixed? if not, can you assign this to me?
It should be fixed with @captain-pool contribution #488
Thanks @captain-pool
"To Configure the Proxy Settings, The User needs to set the Proxies for HTTP, HTTPS and FTP in the Environment Variables TFDS_HTTP_PROXY, TFDS_HTTPS_PROXY, TFDS_FTP_PROXY respectively."
Do you also have an option to pass a CA certificate for SSL ?
Right now it is crahsing with :
requests.exceptions.SSLError: HTTPSConnectionPool(host='zenodo.org', port=443): Max retries exceeded with url: /record/53169/files/Kather_texture_2016_image_tiles_5000.zip (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
This is typical of SSL interception and you need to put SSL verify false (if possible) or simply pass the CA certificate. Did you implement a REQUESTS_CA_BUNDLE environment variable as well ? Wich lib is used in your implementation ? Request ?
Hey @tarrade the downloader uses both requests and urllib. And Sorry, I totally missed the feature request for CA file. I just made it flexible for Proxies. Will add the support for CA Certificates ASAP.
@Conchylicultor should I skip the certificate verification by passing CERT_NONE from ssl, or should I put an option for adding certificate file?
Hi @captain-pool , no problem. I know it is only compny that are using proxy and CA certificate and we are suffering from that everyday. I will be happy to test it when you have it ready. Just tell me in which nithly build it was collected. Thanks
Can you re open the issue?
On Mon, 10 Jun 2019, 9:19 pm Dr. Fabien Tarrade, notifications@github.com wrote:
Hi @captain-pool https://github.com/captain-pool , no problem. I know it is only compny that are using proxy and CA certificate and we are suffering from that everyday. I will be happy to test it when you have it ready. Just tell me in which nithly build it was collected. Thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/datasets/issues/275?email_source=notifications&email_token=ADKYRWJRBRNKJYOM7NUBFLLPZZZZBA5CNFSM4G7G2XT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXKIWOQ#issuecomment-500468538, or mute the thread https://github.com/notifications/unsubscribe-auth/ADKYRWMNOKF4RRXCHTPAFC3PZZZZBANCNFSM4G7G2XTQ .
On my side I cannot reopen this ticket. By the way I forgot thank you for the implementation of this request. You will help a lot of people using company laptop
@tarrade I think #663 should fix this. Give a Check
cc: @Conchylicultor
Hi @captain-pool, it seems the build failed, right ? https://source.cloud.google.com/results/invocations/e044b82b-65e9-4b34-9d5f-abd96aaba0a8/targets/tensorflow_datasets%2Fgh_testing%2Fpresubmit/log
I tested with 1.0.2.dev201906110105 but it is still failling with "bad handshake"
If the fix is already in 1.0.2.dev201906110105, then I will investiagte that I have all ca certificates in my file
@tarrade it is failing because I'm using SSL Context which is supported from python 2.7.9, however, Kokoro is using a version <= python 2.7.8, which doesn't allow that. Let me find out an alternative, will fix it soon. @Conchylicultor @rsepassi @vbardiovskyg @cyfra is it possible to upgrade Kokoro's configuration for python 2 to python 2.7.9 ?
@rsepassi is the expert here, but from what I see it might not be that easy :-( As we'd have to move from the "common" kokoro cluster/image to custom one (and pay the cost of managing it).
I see other places in our code, where we had to do workarounds in the past, to accommodate the fact that linux machines on kokoro use 2.7.8.
Would it make sense to have this feature "disabled" if running on old python version ?
@rsepassi is the expert here, but from what I see it might not be that easy :-( As we'd have to move from the "common" kokoro cluster/image to custom one (and pay the cost of managing it).
I see other places in our code, where we had to do workarounds in the past, to accommodate the fact that linux machines on kokoro use 2.7.8.
Would it make sense to have this feature "disabled" if running on old python version ?
Done :) Disabling for python version <= 2.7.8 seemed like the only valid way out. The Builds are passing. @tarrade after @cyfra verifies and merges, it should be ready :)
@captain-pool good idea to disabling for python version <= 2.7.8. I am quite new in this business how can I see in which build this fix was collected ? It is alread in tfds-nightly==1.0.2.dev201906120105 or should I wait in the one from tomorrow ?
You need to wait till it merges to master branch and you can get it in the nightly build the next day.
Or if it is too urgent. You can clone it from my fork. Then cd into the local repository and git checkout issue_275. Finally:
Any one of these will do the job :)
On Wed, 12 Jun 2019, 2:31 pm Dr. Fabien Tarrade, notifications@github.com wrote:
@captain-pool https://github.com/captain-pool good idea to disabling for python version <= 2.7.8. I am quite new in this business how can I see in which build this fix was collected ? It is alread in tfds-nightly==1.0.2.dev201906120105 or should I wait in the one from tomorrow ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/datasets/issues/275?email_source=notifications&email_token=ADKYRWIGFM6PNFGMBUSDDZDP2C3N3A5CNFSM4G7G2XT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXPXRWQ#issuecomment-501184730, or mute the thread https://github.com/notifications/unsubscribe-auth/ADKYRWMZJQD3SNSY77BWGADP2C3N3ANCNFSM4G7G2XTQ .
I tested the latested build 1.0.2.dev201906180105 and I confirm that it is working with proxy and CA certificate.
Here my test and setup:
export TFDS_HTTPS_PROXY="http://user:password@ip:port/"
export TFDS_CA_BUNDLE=path/ca_certs
It is working for the following dataset:
dataset = tfds.load(name="colorectal_histology_large", split=tfds.Split.TREST)
dataset = tfds.load(name="colorectal_histology", split=tfds.Split.TRAIN)
I have some crashes when the dataset in is on AWS:
tfds.load(name="fashion_mnist", split=tfds.Split.TRAIN)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='fashion-mnist.s3-website.eu-central-1.amazonaws.com', port=80): Max retries exceeded with url: /train-images-idx3-ubyte.gz (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f22bc27df98>: Failed to establish a new connection: [Errno 110] Connection timed out',))
I don't know what is the issue with AWS. Manually I can dowmload the file. I need to retry later. I am in a conf with a not so great network.
Overall it is working. The questions is on which side is the issue with AWS.
of course, I need to add both:
export TFDS_HTTPS_PROXY="http://user:password@ip:port/"
export TFDS_HTTP_PROXY="http://user:password@ip:port/"
and then everything is working fine.
All is working perfectly. Thanks @captain-pool . Closing
Use this
dl_config = tfds.download.DownloadConfig(verify_ssl=False) # Do this shit, or you get a request error!
examples, metadata = tfds.load('cnn_dailymail', with_info=True,
as_supervised=True,
download_and_prepare_kwargs={'download_config': dl_config})
Is your feature request related to a problem? Please describe. Right now behing a proxy, it is not working:
I don't think this is supported for now (I didn't see it in the documentation): https://www.tensorflow.org/datasets/api_docs/python/tfds/load
This will impact quite a lot of people working in company and university
Describe the solution you'd like I am not an expert but using
request
seems to be the standard way. Below on example from a Google GCP tool:ignore the GOOGLE_APPLICATION_CREDENTIALS' whihc is specific to GCP. The user need to setup one or 2 env variables and everything is done in the backgroud (I guess this is using requests)
http://docs.python-requests.org/en/master/user/advanced/#ssl-cert-verification