Closed EricFournier3 closed 1 month ago
I'm not very familiar with proxy servers and how they might interact with curl here. Have you tried setting the following flags?
CURLOPT_SSL_VERIFYPEER=0
CURLOPT_PROXY_SSL_VERIFYPEER=0
CURLOPT_SSL_VERIFYHOST=0
CURLOPT_PROXY_SSL_VERIFYHOST=0
nextclade get ...
I found them here and through StackOverflow: https://www.php.net/manual/en/function.curl-setopt.php
@ivan-aksamentov may know better things to try.
In any case, you can download in any way you like, using wget
or curl
, that is without using the convenience command nextclade dataset get
. The implementation could however change at some point in the future, so it's best to sort out nextclade dataset get
. But for now to be able to use datasets go ahead as follows.
All information is contained in this json: https://data.clades.nextstrain.org/index.json
The latest dataset has the field latest: true
. The URL to the zip folder is:
wget https://data.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-02-07T12:00:00Z/zip-bundle/nextclade_dataset_sars-cov-2_MN908947_2022-02-07T12:00:00Z.zip
Unzip that into a folder and you're ready to go. You may still run into proxy issues, but at least the problem is simplified to "download a zip file" and you don't need to worry about how Nextclade invokes curl.
@stefandiederich had a similar problem being behind a proxy before. Ivan gave some tips back then, but I'm not sure how it got resolved, see here: https://github.com/nextstrain/nextclade/discussions/552 https://github.com/nextstrain/nextclade/issues/532
Maybe he can comment if he sees this. Watching the https://github.com/nextstrain/nextclade_data repo for new releases (maybe twice per month) and manually updating is the worst case. You could script something together that automatically uses the latest datset though.
Hi Eric @EricFournier3,
Sorry, Nextclade does not have the proxy networking functionality and it seems to be out of scope of the project.
You should bring that up with your IT team and tell them that their technical implementation prevents you from doing your job. It's their responsibility to ensure you can do your work effectively. Otherwise why they exist?
Also note that these restrictions might be done on purpose. I.e. they don't want you to download any files from the internet. In that case, you are about to violate some rules, and it's no longer a technical problem - you should seek support of your superiors instead.
If that does not work, you have a few options (from simple and sane to complicated and ridiculous, depending on how bad your situation is):
Unfortunately, we don't have bandwidth to support any of these use-cases officially, but feel free to reach out with questions.
If you ever find a solution or a workaround, please post here so that other users could also benefit.
If you or your colleagues have some time, you could also take a look at how proxy support might be implemented in nextclade. Nextclade is currently using libcurl for networking, and I know that curl has a --proxy
flag, so pushing through HTTP proxies might be doable. I don't have enough knowledge in this area, and neither I have information on what kind of different proxy configurations you or other users might have. As I don't have any of these proxies handy, I also will not be able to test if it works even if I implement it. Contributions in this area are welcome.
We are planning some mid-term changes in Nextclade which may or may not help to resolve this - the network implementation will be different and it might have a better support for proxies (or not).
I'll close this as wontfix
but you're of course still welcome to comment @EricFournier3
Thank you. I will try one of your suggestions
It looks like a lot of people have this problem. So while we may not want to fix it in the current Nextclade version 1, maybe this is something to bear in mind when moving to Nextclade v2? @rneher @ivan-aksamentov If we rustify the CLI, this may be easier, since we just use a rust crate for http requests instead of C++ interfacing to CURL etc. I'll therefore reopen, so this doesn't get forgotten.
@jacaravas has written a (workaround) shell script to replace nextclade dataset get
when the latter doesn't work: https://github.com/jacaravas/update-nextclade-dataset
I don't think the issue here is lack of proxy support. My understanding is that libcurl respects standard proxy environment variables unless they've been explicitly disabled/overridden by libcurl init options (which I assume Nextclade wouldn't be doing).
Instead, I think nextclade's libcurl either…
nextclade
?), or…Situation 1 seems most likely to me given the info so far. To fix this and retain the use of the proxy, @EricFournier3 should be able to set the CURL_CA_BUNDLE
environment variable to a CA bundle file that includes the proxy's cert's CA. Curl's CA store handling is described a bit more on these two pages:
This situation is fairly common. As an example of this happening elsewhere, @jacaravas ran into a similar issue in nextstrain/ncov with the Python requests
library and was able to resolve it by setting request
's corresponding env var (REQUESTS_CA_BUNDLE
).
Hi, It looks like this problem is coming from the openssl crate: https://docs.rs/openssl/latest/openssl/
The vendored copy will not be configured to automatically find the system’s root certificates, but the openssl-probe crate can be used to do that instead.
This could explain why nextclade dataset
fails on a properly configured system, where curl
and wget
work flawlessly in a shell, with https requests going through a transparent proxy (for web filtering). i.e. With the proxy's cert's CA in the system CA store or with environment variables and a CA bundle file.
I'll let you do your excellent development work from here, I'm back to my questionable existence as a sysadmin in an IT team. ;-)
Thanks for your idea @pilem! This bug report is related to v1 which isn't on master anymore. We used to use c++ back then, so no rust at all Impossible to know when reading this issue of course 🙃
Indeed @corneliusroemer! I will open a new bug report then, since we see the same behaviour with the current master, tag 2.0.0-alpha3.
Hi @pilem thanks for trying Nextclade v2! How do you invoke the command?
Is there an easy way for us to emulate your environment to be able to reproduce the error? (i.e. a dockerfile, a provisioning script for a VM etc.).
My understanding is that Nextclade via reqwest
should use rustls
instead of openssl
, with the current configuration:
https://github.com/nextstrain/nextclade/blob/c4ce28edd438efa7ecec2c613e30ccbe3f2dfe7f/packages_rs/nextclade-cli/Cargo.toml#L36
Perhaps flipping it to openssl could work?
Alternatively, I think reqwest
also has a possibility to pass a blob of CA certs during initialization. If so, how could I obtain it on a typical proxied system? Perhaps I could add a flag and/or to read an env var?
Tangentially related: can you also help us to test Nextclade v2 with SOCKS proxies?
Hi @ivan-aksamentov,
I did my tests with your instructions for local dev with rustup and cargo, then:
$ cargo run --bin=nextclade -- dataset get --name sars-cov-2 --output-dir /tmp
[2022-06-02 15:14:36.966][nextclade][WARN ] /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.6/src/conn.rs:1285: Sending fatal alert BadCertificate
Error:
0: When downloading dataset index
1: error sending request for url (https://data.master.clades.nextstrain.org/index_v2.json): error trying to connect: invalid peer certificate contents: invalid peer certificate: UnknownIssuer
2: error trying to connect: invalid peer certificate contents: invalid peer certificate: UnknownIssuer
3: invalid peer certificate contents: invalid peer certificate: UnknownIssuer
Location:
packages_rs/nextclade-cli/src/io/http_client.rs:91
For info: It works fine when outside our corporate network and also curl
works normally on our servers with our own root CA and certs (used by the proxy) configured on the system.
It might be difficult to create a similar setup, you'll need a transparent web filtering device, or software, with custom trusted CA, and port forwarding on the default route. It's all feasible, but it would be a lot easier to look for system calls with strace
. Simply grep etc to see system's calls to the server configs:
$ strace ./target/debug/nextclade dataset get --name sars-cov-2 --output-dir /tmp 2>&1 | grep etc
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
As you can see, no calls to ssl/tls/cert/pki configs are made.
With curl
, as an example, on the same server:
$ strace curl -o /dev/null https://data.master.clades.nextstrain.org/index_v2.json 2>&1 | grep etc | egrep "pki|ssl|cert"
open("/etc/pki/tls/legacy-settings", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/etc/pki/nssdb", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/etc/pki/nssdb/pkcs11.txt", O_RDONLY) = 4
... (many more)
As I understand it, the problem comes with the vendoring features of cargo. When vendored, rustls
or native-tls
are replaced by a statically linked build of openssl, as documented here: https://docs.rs/native-tls/latest/native_tls/. And without openssl-probe functions, only the embedded root CA certs are used, not the system's one.
I can give a hand to test SOCKS proxy. This is easy to setup, only a container and environment variables are required, like https://hub.docker.com/r/serjs/go-socks5-proxy/.
@pilem Have you tried the new --proxy*
command line flags? I think that reqwest
does not look for proxies by default and needs to be told explicitly, so I added the flags and pass this information on to reqwest
. (And in order to improve on that feature, is there a best practice for automated detection of HTTP proxies? i.e. perhaps through env vars? Perhaps the one that curl uses?)
Can you tell me more about your config? In particular:
I am really no expert on either, but I believe is that the implementation of client software is quite different for these two cases.
If Nextclade still does not work with the proxy flags, then how can I detect the path to your custom cert file? If you have an env var with the path, or can provide it through a flag, then I might try to add a call to reqwest::ClientBuilder::add_root_certificate(). Do you think it may work?
Update: I now see that I can use the https://github.com/alexcrichton/openssl-probe crate to find out the cert paths.
Update2: And perhaps also the https://github.com/inejge/env_proxy to detect proxies.
When vendored, rustls or native-tls are replaced by a statically linked build of openssl
I don't believe we use native-tls vendoring. That would require turning on this feature: https://github.com/seanmonstar/reqwest/blob/28840afd46fe3b81b7c77dde4537ad702826c7f7/Cargo.toml#L38
In fact, if I understand correctly, we don't use native-tls at all and it should always be rustls, which does not have vendoring and not using openssl library.
I will be away for a few days, but if someone wants to take a look during that time, then the relevant code is:
HttpClient::new()
configures the undelrying implementation (reqwest
), including proxy params and I guess the certs can be added there as well. https://github.com/nextstrain/nextclade/blob/4b7ac6f6ae9b653106c974874c9e4b7b6e93495e/packages_rs/nextclade-cli/src/io/http_client.rs#L32-L63struct ProxyConfig
. The new fields will become CLI args for dataset get
and dataset list
commands automagically. https://github.com/nextstrain/nextclade/blob/4b7ac6f6ae9b653106c974874c9e4b7b6e93495e/packages_rs/nextclade-cli/src/io/http_client.rs#L8-L25reqwest
crate config. Here we disable default features, including native-tls
and then only enable rustls-tls
https://github.com/nextstrain/nextclade/blob/c4ce28edd438efa7ecec2c613e30ccbe3f2dfe7f/packages_rs/nextclade-cli/Cargo.toml#L36Our setup is a transparent MITM web filtering configured on the network equipment. So no need for --proxy flag in our case. http/https requests use the default route with the default ports.
The ssl/tls chain is broken by this filtering as https requests are remade by the transparent proxy, with its private certificate for internal communication.
We use standard system path, but you've already found that.
I've just found this proxy tool, mitmproxy, that does https with a private cert with basically no config. It could be useful for testing as it is easy to reproduce the same error as above with it.
Installation and launch, use a separate xterm or a tmux/screen session as mitmproxy opens a text console.
$ wget https://snapshots.mitmproxy.org/8.1.0/mitmproxy-8.1.0-linux.tar.gz
$ tar xf mitmproxy-8.1.0-linux.tar.gz
$ ./mitmproxy
Import the mitmproxy generated CA cert in the host CA trusted store. Assuming Debian based distribution:
$ sudo cp ~/.mitmproxy/mitmproxy-ca-cert.cer /usr/local/share/ca-certificates/mitmproxy-ca-cert.crt
$ sudo update-ca-certificates
Test with curl
, look at the mitmproxy console for confirmation:
$ https_proxy=https://localhost:8080 curl https://pilem.info
ok
Test with nextclade
:
$ cd nextclade
$ https_proxy=https://localhost:8080 ./target/debug/nextclade dataset get --name sars-cov-2 --output-dir /tmp
[2022-06-03 09:41:17.621][nextclade][WARN ] /home/pierre/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.6/src/conn.rs:1285: Sending fatal alert BadCertificate
Error:
0: When downloading dataset index
1: error sending request for url (https://data.master.clades.nextstrain.org/index_v2.json): error trying to connect: invalid peer certificate contents: invalid peer certificate: UnknownIssuer
2: error trying to connect: invalid peer certificate contents: invalid peer certificate: UnknownIssuer
3: invalid peer certificate contents: invalid peer certificate: UnknownIssuer
Location:
packages_rs/nextclade-cli/src/io/http_client.rs:91
Look at mitmproxy event log («E» on the text console):
warn: 127.0.0.1:33724: Client TLS handshake failed. The client does not trust the proxy's certificate for localhost (sslv3 alert bad certificate)
Using reqwest's default-features fix this issue @ivan-aksamentov, @corneliusroemer. I don't know if it breaks something else.
diff --git a/packages_rs/nextclade-cli/Cargo.toml b/packages_rs/nextclade-cli/Cargo.toml
index 0f933e6c..659fefb5 100644
--- a/packages_rs/nextclade-cli/Cargo.toml
+++ b/packages_rs/nextclade-cli/Cargo.toml
@@ -33,7 +33,7 @@ owo-colors = "3.3.0"
pretty_assertions = "1.2.1"
rayon = "1.5.2"
regex = "1.5.5"
-reqwest = { version = "0.11.10", default-features = false, features = ["blocking", "deflate", "gzip", "brotli", "socks", "rustls-tls"]}
+reqwest = { version = "0.11.10", default-features = true, features = ["blocking", "deflate", "gzip", "brotli", "socks", "rustls-tls"]}
semver = "1.0.9"
serde = { version = "1.0.136", features = ["derive"] }
url = { version = "2.2.2", features = ["serde"] }
This request works with mitmproxy and a self signed certificate:
https_proxy=https://localhost:8080 ./target/debug/nextclade dataset get --name sars-cov-2 --output-dir /tmp
mitmproxy logs:
127.0.0.1:48190: GET https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/genemap.gff
<< 200 OK 417b
127.0.0.1:48200: GET https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/qc.json
<< 200 OK 760b
127.0.0.1:48222: GET https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/virus_properties.json
<< 200 OK 4.3k
127.0.0.1:48240: GET https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/reference.fasta
<< 200 OK 9.3k
127.0.0.1:48252: GET https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/sequences.fasta
<< 200 OK 741k
127.0.0.1:48190: GET https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/tree.json
<< 200 OK 303k
@pilem Thanks for checking! Does it also work with your production hardware setup you mentioned previously?
Enabling default features in reqwest
means on Linux it will use openssl (even if rustls feature is also enabled). So it requires openssl installed during build and perhaps even on runtime.
There is a few effects:
So I cannot decide how to proceed.
We could:
I'll think about it and I think I am mostly up for solution (1). However it was never my intent to write and maintain this functionality for a bioinformatics tool. It's way out of scope.
It works well with our production hardware @ivan-aksamentov. I understand for the cross-compile trouble, but that's one of the things VM and containers are for. And I wouldn't worry about the two other points. There maybe are better alternatives, but openssl is not going away soon. And it is actually rustls/reqwest/cargo/rust responsibility to make working feature builds, and they do with the default features.
@pilem Check out https://github.com/nextstrain/nextclade/pull/1527. If your private CA certificate is in the system's trust store, it should Just Work™. Or you can configure it as an additional trusted CA certificate by setting NEXTCLADE_EXTRA_CA_CERTS=/path/to/certs.pem
when running nextclade
.
nextclade version 1.10.2 on centos-release-7-4.1708.el7.centos.x86_64
Hi,
when I execute the following command,
nextclade dataset get --name sars-cov-2 --output-dir /home/foueri01@inspq.qc.ca/temp/20220209/TESTNEXTLADE
I get the following error [ERROR] Nextclade: When fetching a file "https://data.clades.nextstrain.org/index.json": CURLE_PEER_FAILED_VERIFICATION: SSL peer certificate or SSH remote key was not OK I got the same error when I append insecure in my ~/.curlrcActually, our process run under a proxy with ssl full inspection. The non-recognized certificate is on our side. Is there a way like curl to activate a switch to ignore the certificate ?
Can you please help me with this Best, Eric