reown-com / a2

An Asynchronous Apple Push Notification (apns2) Client for Rust
MIT License
150 stars 52 forks source link

Connection errors #6

Closed dbrgn closed 6 years ago

dbrgn commented 6 years ago

I'm having connection errors when trying to connect to APNS:

TRACE 2018-04-03T08:46:05Z: apns2::signer: Signer::with_signature found signature for """""" APNS valid for 2700s
TRACE 2018-04-03T08:46:05Z: apns2::alpn: AlpnConnector::call ("https://api.push.apple.com/3/device/******")
TRACE 2018-04-03T08:46:05Z: apns2::client: Client::call requesting ("https://api.push.apple.com/3/device/******")
TRACE 2018-04-03T08:46:06Z: apns2::alpn: AlpnConnector::call got TCP, trying TLS
TRACE 2018-04-03T08:46:06Z: apns2::client: Request error: an operation was canceled internally before starting
 WARN 2018-04-03T08:46:06Z: push_relay::server: Error: Push message could not be processed: Push was unsuccessful: Error connecting to APNs

Any idea what the reason for this could be?

pimeys commented 6 years ago

So you're using the token connector? You have a valid private key in p8 format, team id and key id set?

Are you using Mac or Linux? If Mac, OSX doesn't support ALPN by default. I'm using Linux and the latest OpenSSL, which has support for ALPN.

pimeys commented 6 years ago

But hmm, if it's a token connector, the connection should be formed even with a faulty token. And for ALPN, we use rustls which should hold a rust implementation for ALPN so OpenSSL should not be a problem here.

dbrgn commented 6 years ago

Yes, this is the token connector. I'm also on Linux, with OpenSSL 1.1.0g.

If I change the private key (in p8 format) by adding another letter, then the thread throws an error before even attempting to connect ("SignerError: An OpenSSL error stack"), so at least the format of the key should be correct.

Could it be that I made a mistake in the team id and/or key id? Might that lead to a connection error?

pimeys commented 6 years ago

No, with wrong team/key ids you get an error from Apple, like so:

./target/debug/examples/token_client -p examples/private.p8 -t ASDF -k ASDF -d ASDF -m Foo
Error: ResponseError(Response { error: Some(ErrorBody { reason: InvalidProviderToken, timestamp: None }), apns_id: Some("997881E1-ACF0-ABB2-D9C2-2BF175564C01"), code: 403 })
pimeys commented 6 years ago

I'm also in the same OpenSSL version as you are.

pimeys commented 6 years ago

I just pushed a bit more logging. Could you pull, compile examples and run the token_client with your p8 key, team/key id and a device token. Set RUST_LOG=apns=trace and tell me the output.

pimeys commented 6 years ago

You can also set RUST_LOG=rustls,hyper,apns=trace to get tracing from hyper and rustls.

dbrgn commented 6 years ago

Hm, strange.

The example:

 DEBUG rustls::client::hs > Server DNS name is DNSName("api.development.push.apple.com")
 TRACE rustls::client::hs > Not sending CertificateVerify, no key
 DEBUG rustls::client::hs > Session not saved
 TRACE hyper::proto::h2   > send body chunk: 64B
 TRACE hyper::proto::h2   > send body eos
 TRACE apns2::client      > Client::call got response status 403 Forbidden from ("https://api.development.push.apple.com/3/device/******")
Error: ResponseError(Response { error: Some(ErrorBody { reason: InvalidProviderToken, timestamp: None }), apns_id: Some("******"), code: 403 })

My code:

DEBUG 2018-04-03T10:00:54Z: rustls::client::hs: Server DNS name is DNSName("api.development.push.apple.com")
TRACE 2018-04-03T10:00:54Z: rustls::client::hs: Not sending CertificateVerify, no key
DEBUG 2018-04-03T10:00:54Z: rustls::client::hs: Session not saved
TRACE 2018-04-03T10:00:54Z: hyper::proto::h2::client: client tx dropped
TRACE 2018-04-03T10:00:54Z: apns2::client: Request error: an operation was canceled internally before starting

The hyper version is the same (0.11.18).

dbrgn commented 6 years ago

The error seems to originate from here: https://github.com/pimeys/apns2/blob/3527054dab5d3966d81fd0b547a880209f34a8ec/src/client.rs#L213-L218

Note that I'm calling the apns code from within a Hyper http2 async server handler. Maybe something goes wrong there?

Here's the full log:

DEBUG 2018-04-03T10:12:27Z: rustls::client::hs: Server DNS name is DNSName("api.development.push.apple.com")
TRACE 2018-04-03T10:12:27Z: rustls::client::hs: Not sending CertificateVerify, no key
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: flushed State { reading: KeepAlive, writing: Init, keep_alive: Busy, error: None, read_task: Some(Task) }
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::dispatch: Dispatcher::poll
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: Conn::read_keep_alive
DEBUG 2018-04-03T10:12:27Z: rustls::client::hs: Session not saved
TRACE 2018-04-03T10:12:27Z: hyper::client::pool: Pool::put ("https://api.development.push.apple.com", Http2)
TRACE 2018-04-03T10:12:27Z: hyper::client::pool: Pool::put removing canceled parked ("https://api.development.push.apple.com", Http2)
DEBUG 2018-04-03T10:12:27Z: hyper::client::pool: pooling idle connection for ("https://api.development.push.apple.com", Http2)
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: flushed State { reading: KeepAlive, writing: Init, keep_alive: Busy, error: None, read_task: Some(Task) }
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::dispatch: Dispatcher::poll
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: Conn::read_keep_alive
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: flushed State { reading: KeepAlive, writing: Init, keep_alive: Busy, error: None, read_task: Some(Task) }
TRACE 2018-04-03T10:12:27Z: hyper::proto::h2::client: client tx dropped
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::dispatch: Dispatcher::poll
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: Conn::read_keep_alive
TRACE 2018-04-03T10:12:27Z: apns2::client: Request error: an operation was canceled internally before starting
 WARN 2018-04-03T10:12:27Z: push_relay::server: Error: Push message could not be processed: Push was unsuccessful: Error connecting to APNs
TRACE 2018-04-03T10:12:27Z: hyper::proto: should_keep_alive(version=Http11, header=None) = true
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::role: ServerTransaction::encode has_body=true, method=Some(Post)
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::encode: sized write, len = 19
DEBUG 2018-04-03T10:12:27Z: hyper::proto::h1::io: flushed 155 bytes
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: maybe_notify; read_from_io blocked
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: flushed State { reading: Init, writing: Init, keep_alive: Idle, error: None, read_task: Some(Task) }
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::dispatch: Dispatcher::poll
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: Conn::read_head
DEBUG 2018-04-03T10:12:27Z: hyper::proto::h1::io: read 0 bytes
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::io: parse eof
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: State::close_read()
DEBUG 2018-04-03T10:12:27Z: hyper::proto::h1::conn: read eof
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: Conn::read_keep_alive
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: parking current task
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: maybe_notify; notifying task
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: flushed State { reading: Closed, writing: Init, keep_alive: Disabled, error: None, read_task: Some(Task) }
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::conn: shut down IO
TRACE 2018-04-03T10:12:27Z: hyper::proto::h1::dispatch: Dispatch::poll done

This line looks interesting:

TRACE 2018-04-03T10:12:27Z: hyper::proto::h2::client: client tx dropped

pimeys commented 6 years ago

It has something to do with the ALPN for sure. If the example works, are you sure your own code reads the private key correctly? Are you reading from database, from file or from &str?

Are you having the team id and key id in the right order?

Do you have the latest commit id from apns2 and the latest commit id from hyper's h2 branch? This doesn't follow the stable hyper, so it's important to stay in the latest commit.

pimeys commented 6 years ago

The h1 stuff in the logs is from here:

https://github.com/pimeys/apns2/blob/master/src/alpn.rs#L106

let connecting = self.http
    .call(uri)
    .and_then(move |tcp| {
        trace!("AlpnConnector::call got TCP, trying TLS");
        tls.connect_async(host.as_ref(), tcp)
            .map_err(|e| {
                trace!("AlpnConnector::call got error forming a TLS connection.");
                io::Error::new(io::ErrorKind::Other, e)
            })
    })
    .map(|tls| MaybeHttpsStream::Https(tls))
    .map_err(|e| {
        trace!("AlpnConnector::call got error reading a TLS stream (#{}).", e);
        io::Error::new(io::ErrorKind::Other, e)
    });

HttpsConnecting(Box::new(connecting))

So it uses Hyper's h1 http connector for async resolving and getting a TCP handler, which it then uses to do the TLS ALPN handshake, returning a stream that goes to the h2 crate. Nothing abnormal in your example posted to the hyper issue.

dbrgn commented 6 years ago

are you sure your own code reads the private key correctly? Are you reading from database, from file or from &str?

I'm reading the same file as used in the example into a Vec and passing a reference to that to apns2. So I believe that should be correct.

Are you having the team id and key id in the right order?

I double checked and believe so.

So it uses Hyper's h1 http connector for async resolving and getting a TCP handler, which it then uses to do the TLS ALPN handshake, returning a stream that goes to the h2 crate.

I see...

Do you have the latest commit id from apns2 and the latest commit id from hyper's h2 branch?

I think so, yes. Is there a way to see the actually used commit id?

pimeys commented 6 years ago

I think so, yes. Is there a way to see the actually used commit id?

Check your Cargo.lock. Do cargo update.

I'm reading the same file as used in the example into a Vec and passing a reference to that to apns2. So I believe that should be correct.

Yeah, &[u8] implements Read so it should work, even though in the example I'm giving it a File. Good luck with bug hunting, take my example as a working reference and dig into the code. I'm also interested is there something I could do better here so the library is easier to implement. This sounds like a super simple mistake, that you find after having a lunch. I've had some of those when working with Apple's services.

pimeys commented 6 years ago

I tried to chage the example to read the private key into a Vec and pass a slice to the client. It still works:

let mut private_file = File::open(key_file).unwrap();
...
let mut pem_key: Vec<u8> = Vec::new();
private_file.read_to_end(&mut pem_key).unwrap();

let client = Client::token(
    pem_key.as_slice(),
    key_id.as_ref(),
    team_id.as_ref(),
    &handle,
    endpoint,
).unwrap();

But here you shouldn't really read the file into a vector, because the client does that for you. Just pass a File handler that implements Read.

dbrgn commented 6 years ago

The thing is that I want to be able to read the key a second time, and after the first time an opened file will return nothing, right?

I'm also interested is there something I could do better here so the library is easier to implement.

One thing that could be a source of bugs is that both key id and team id are strings. Maybe a wrapper newtype could be used to make it impossible to confuse the two.

pimeys commented 6 years ago

The thing is that I want to be able to read the key a second time, and after the first time an opened file will return nothing, right?

Yeah, true. I was thinking of our use case, where we supervise the connections. Every connection reactor core gets their own thread, and the supervisor will read the connection data from a database every time it changes or a connection is kicked out. But we do a quite special multi-app setup where you need a separate connection for every app.

One thing that could be a source of bugs is that both key id and team id are strings. Maybe a wrapper newtype could be used to make it impossible to confuse the two.

I was thinking something like:

pub enum ConnectionOptions<'a>{
    Token {
        pkcs8_pem: Read,
        key_id: &'a str,
        team_id: &'a str,
        endpoint: Endpoint,
    },
    Certificate {
        certificate: Read,
        password: &'a str,
    }
}

But here we can't use a generic Read, or we have to box it, or mark the trait to the Client so it must know which type is given as a Read, which is a trade off I might not want to pay. It even gets worse if the ids or password use Into. Having a type for the team id or key id will also require us to decide are we going to wrap a &str or a String, and if we want to allow an easy access with both, we must do a builder that has methods with generic types, generating the correct instance.

Named arguments would help here, but let's see if the RFC ever goes through.

dbrgn commented 6 years ago

But here we can't use a generic Read, or we have to box it

Would a trait object (&Read) not work?

Having a type for the team id or key id will also require us to decide are we going to wrap a &str or a String

That's exactly what Cow is for (although you'd have to propagate the lifetime if non-static, which isn't very nice of course...).

But we're drifting off :) I'll try to debug the connection issues soon.

dbrgn commented 6 years ago

I found the culprit. In hindsight it could have been obvious.

Let's say I defined this function:

pub fn send_push(client: Client, push_token: String)
        -> Box<Future<Item = (), Error = String>> {
    let options = NotificationOptions {
        apns_id: None,
        apns_expiration: Some(30),
        apns_priority: Priority::High,
        apns_topic: Some("wc".to_string()),
        apns_collapse_id: None,
    };

    let payload = SilentNotificationBuilder::new().build(push_token, options);

    Box::new(
        client
            .send(payload)
            .map(|response| println!("Success details: {:?}", response))
            .map_err(|error| format!("Push was unsuccessful: {}", error))
    )
}

This fails, because the client is moved into the function and then dropped.

I assume the Client::send method somehow moves a rx channel into the future, but doesn't do anything about the client. Dropping the client causes the channel to close, thus the log message "hyper::proto::h2::client: client tx dropped".

If I change the signature above so that it borrows the Client, then everything seems to work.

I think this is a very subtle and non-obvious error that we should try to prevent in our API. Any ideas, @pimeys?

pimeys commented 6 years ago

That's a quite interesting way of using the client. Could you possible hold it in a struct, that implements the send_push? Or try to send a reference of that client to the function, if you can cope with the lifetimes. How we use this is that we have an instance of Notifier, then just use it from the notify:

    pub fn notify(&self, event: &PushNotification) -> Timeout<FutureResponse> {
        self.client.send_with_timeout(self.gen_payload(event), Duration::from_secs(3))
    }

If you do it like this, you take the ownership and to get it back, you should have it in the return value. Also try to remember to keep the client alive as long as possible, otherwise you'll always get a new connection and if you open them fast enough, Apple will threat you as Denial of Service.

dbrgn commented 6 years ago

Could you possible hold it in a struct

That's exactly how I implemented it now (which works and is obviously better than creating a new client every time). But before that I just wanted a quick'n'dirty standalone implementation that does not change the "owning code". Unfortunately it wasn't quick :wink:

The thing is that other people will most probably run into the same issue. Could there be a way of enforcing that the lifetime of the client should be longer than the lifetime of the returned future, without it being too much hassle? If not, could we detect this and log a message on warning level if the client is dropped before the future has completed?

pimeys commented 6 years ago

No, the issue is different here.

You claim the lifetime of the client, but return a box of a future. This is definitely something the compiler should nag about, I didn't even know this is possible to compile. You should ask more about it from the authors of futures-rs or tokio.

You see the returned future is not evaluated here, only after you explicitly send it to a reactor core. To not have a client at this point should not be possible at all. So you return a boxed function, send the function to the core and it tries to evaluate it when the client is already gone. These are issues rustc should definitely be able to catch, which makes this very interesting.

dbrgn commented 6 years ago

You see the returned future not evaluated here, only after you explicitly send it to a reactor core. To not have a client at this point should not be possible at all.

It's possible because the future does not contain a reference to the client itself, but a receiving end of a channel or stream or something like that. The sender of such a type can be dropped even if the receiving end is still alive. I think this is because futures require a static lifetime, if I'm not mistaken?

Maybe @carllerche or @seanmonstar know more about this? (Relevant comments: https://github.com/pimeys/apns2/issues/6#issuecomment-378287736 and later)

pimeys commented 6 years ago

I'll close this for now. The problem had nothing to do with this library, more with Hyper/Tokio/Futures.