summa-tx / coins

Rust implementations of BIP32/39 and Ledger device comms
Other
90 stars 31 forks source link

fix: greatly simplify rusb usage #129

Closed prestwich closed 7 months ago

prestwich commented 8 months ago

leaving this open so that @naps62 and @xJonathanLEI can check it before merge 👍

naps62 commented 8 months ago

Allright. this seems to fix both issues for me, although it still has one less serious wrinkle.

nevermind. there's still a panic if I try to instantiate a ledger twice while the device is locked (1st call fails with Err(UnexpectedNullResponse). Second call panics at hid.rs:143

Other than that, there's still a less serious wrinkle. the code below, without the sleep, fails 80% of the time on the second call (Err, not panic). adding the sleep makes it work more consistently. updated my reproduction repo to showcase this

#[tokio::main]
async fn main() {
    // first call, works
    let _ = dbg!(call_ledger().await);

    // tokio::time::sleep(tokio::time::Duration::from_millis(10)).await;

    // second call, if done without sleep first, fails with `hid_error is not implemented yet`
    let _ = dbg!(call_ledger().await);
}

pub(crate) async fn call_ledger() -> Result<Ledger, LedgerError> {
    let path: String = "m/44'/60'/0'/0/0".into();
    Ledger::new(HDPath::Other(path), 1).await
}

including the sleep makes this work consistently. without the sleep, the 2nd call will fail ~80% of the time:

[src/main.rs:11] call_ledger().await = Ok(
    LedgerEthereum {
        transport: Mutex {
            is_locked: false,
            has_waiters: false,
        },
        derivation: Other(
            "m/44'/60'/0'/0/0",
        ),
        chain_id: 1,
        address: ...........,
    },
)
[src/main.rs:12] call_ledger().await = Err(
    LedgerError(
        NativeTransportError(
            CantOpen(
                HidApiError {
                    message: "hid_error is not implemented yet",
                },
            ),
        ),
    ),
)
xJonathanLEI commented 8 months ago

Will check whether this works on Android tomorrow and revert.

prestwich commented 8 months ago

nevermind. there's still a panic if I try to instantiate a ledger twice while the device is locked (1st call fails with Err(UnexpectedNullResponse). Second call panics at hid.rs:143

Are you sure the line number is correct? that line is currently first_ledger(api).map(Self::from_device) which should not be capable of panicking

Other than that, there's still a less serious wrinkle. the code below, without the sleep, fails 80% of the time on the second call (Err, not panic). adding the sleep makes it work more consistently.

This sounds like it might be a race condition between instnatiating the new task and releasing the resource in the old task? I'll play around with it

naps62 commented 8 months ago

not sure what I was on about with the panic thing. seems gone now, can't reproduce even pointing to the same rev. I'll assume it was still the sleep deprivation

2nd issue (two successive calls without waiting) still exists, although I can work with it already, if you feel you need to merge this and research that one later

thanks very much for the support btw :pray:

xJonathanLEI commented 8 months ago

Just tested 9f86de4edf71b4c631f870ed95548362e196ead7 on Android and it works perfectly. So far no issue found.

xJonathanLEI commented 8 months ago

Actually, no. I made a mistake and tested the wrong code. It actually no longer compiles on Android. I will look into it.

xJonathanLEI commented 8 months ago

Just sent #130 to fix the compilation error. It's trivial.

xJonathanLEI commented 8 months ago

However, even with the compilation fix, it still doesn't work properly on Android. I'm testing this crate through ethers-signers so it might be hard to immediately see the cause though.

Specifically, dumping Ethereum addresses from paths works fine. However, when requesting a transaction signature, I'm getting this error:

Response too short. Expected at least 2 bytes. Got []

Again I'm not even sure whether ethers-signers or coins-ledger throws this error. Will need to look deeper into things.

(Just to make sure, I also tested against the current main, which works perfectly. So the issue does come from this PR.)

naps62 commented 8 months ago

Response too short. Expected at least 2 bytes. Got []

I also get these randomly from time to time (even before this PR). have you tried multiple times, to make sure it's specifically from this commit, and not a fluke run?

xJonathanLEI commented 8 months ago

Yeah I tried multiple times. In fact, I've never encountered this error even once before this PR.

prestwich commented 8 months ago

is there an easy way for me to replicate your test environment without buying an android device @xJonathanLEI? Would be nice to check things locally 😅

naps62 commented 8 months ago

I have an android, and am willing to do some debugging to narrow it down If you have a way to replicate I can give it a try

xJonathanLEI commented 8 months ago

@prestwich Usually it's possible to check compilation etc via NDK, but since this is hardware stuff it might be hard to do without the actual hardware 😅.

@naps62 Are you able to reproduce the same bug? I can make a temp repo/gist for this if that's helpful. To make it easier I will use coins-ledger directly for the repro.

naps62 commented 8 months ago

Are you able to reproduce the same bug? I can make a temp repo/gist for this if that's helpful. To make it easier I will use coins-ledger directly for the repro.

I saw this bug a few times, but seems mostly at random, and never frequently enough to debug. if it's as you say and it happens consistently on android, I'd be happy to try your repo

I guess there's a lot of variance depending on hardware in all this (so far I tested only on Linux)

prestwich commented 8 months ago

Response too short. Expected at least 2 bytes. Got []

So for clarity, this indicates that the read from the hid device was successful, but no data was available. It actually indicates that the device returned a succesful APDU answer with 0 bytes. If there was not a succesful answer, or if reading of that answer had failed, we would expect a NativeTransportError

So it is strange to me that we're getting Ok(vec![]) when we're expecting a non-zero amount of data

xJonathanLEI commented 8 months ago

Working on the reproducer and noticed that like dumping addresses, signing messages also works perfectly.

xJonathanLEI commented 8 months ago

@naps62 The reproducer is now available: xJonathanLEI/coins-android-bug-repro.

~@prestwich Turns out the error occurs when calling transport.exchange(), returning a ResponseTooShort([]) error. So it's not the case that the device returns an empty successful message as we thought before.~

_(Edit: I re-read your comment. Yes it's indeed the case that the native transport is successfully returning an empty answer. The error comes from from_answer() where the response gets parsed.)_

Again note that this issue only occurs when signing a transaction but not a message. I originally coded the reproducer to sign a "hellow world" message but it ended up succeeding both on main and on this PR lol.

prestwich commented 7 months ago

Alright, so i can repro without android. Will be diving into this today 🎉

prestwich commented 7 months ago

okay I tracked this down to an off-by-one in the write buffer that was sending the device incorrect data. I am still unsure why this was significant only for certain instructions, but it seems to be fixed in the latest commit. Would you mind verifying on android for me? :D

xJonathanLEI commented 7 months ago

@prestwich I updated rev to the current HEAD (9891bd9c09321da0682e3d922a4679e91bab75f1) and it works - the transaction to zero address can now be correctly signed. Haven't tested signing personal messages or dumping addresses but I guess those should work fine too.

xJonathanLEI commented 7 months ago

Tested dumping addresses and signing a "hello world" message - both work perfectly. All good on my side now. Thanks!

prestwich commented 7 months ago

thanks y'all! I've just published at 0.9.2 and yanked 0.9.1 and 0.9.0 🎉

naps62 commented 7 months ago

FYI, still transient errors with two consecutive invocations. but something I can live with, and probably debug myself later :+1: thanks for the release

prestwich commented 7 months ago

I'll address the race condition in 0.9.3 👍