solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
12.92k stars 4.13k forks source link

solana-test-validator unable to confirm transactions when a specific version is compiled from source. #28191

Closed godmodegalactus closed 1 year ago

godmodegalactus commented 1 year ago

Problem

I have a weird issue, when I compile a specific version of solana, solana-test-validator or solana-validator do not confirm any transactions. I have tested airdrops and transfer instructions to test. But i do not have the same issue if I download the binaries using solana-install init or use binaries compiled on branch master. I also do not have this issue for tags before v1.11.0 (may be related to quic).

I have tested with following version where is does not work : v1.14.3, v1.11.8, v1.11.9. Environment : Ubuntu 22.04, Ubuntu 20.04,

Steps,

git checkout v1.14.3
cargo build
./target/debug/solana-test-validator

and from another terminal solana airdrop 10

CriesofCarrots commented 1 year ago

I was not able to repro with solana-test-validator on v1.14.3:

$ git checkout v1.14.3
$ cd validator && cargo run --bin solana-test-validator
Ledger location: test-ledger
Log: test-ledger/validator.log
...
$ solana -V
solana-cli 1.11.3 (src:1db136a8; feat:3270869161)

$ solana airdrop 10 -ul
Requesting airdrop of 10 SOL

Signature: hR9mehzv9rrAyPVvwKCze8JVJy7D2GHj9UfT1ArvmfytnaGzHT3EUVnGRRyi4n3SoQkuJer3uG3RKtLbgDG3MpD

500000010 SOL

Are you sure that you're targeting your local test-validator? What does solana config get say? What version is your cli?

godmodegalactus commented 1 year ago

I have tested with cli version v1.14.3 and now with v1.11.3 i still have the same issue.

solana config get gives

Config File: /home/galactus/.config/solana/cli/config.yml
RPC URL: http://localhost:8899 
WebSocket URL: ws://localhost:8900/ (computed)
Keypair Path: /home/galactus/.config/solana/id.json 
Commitment: confirmed 

Something related to config i guess. I tried to reproduce it on windows using wsl2 on Ubuntu22.04 AND Ubuntu 20.04 I have same issue. I am currently using PopOS in linux and have same issue.

CriesofCarrots commented 1 year ago

You did not answer my question about cli version: solana -V

godmodegalactus commented 1 year ago

I think I have answered above. I have tested with cli versions 1.14.3 and 1.11.3 solana-cli 1.11.3 (src:1db136a8; feat:3270869161)

CriesofCarrots commented 1 year ago

Sorry, my mistake! I misread. Thought those were additional solana-test-validator versions you tried. I still can't reproduce this on macos or linux. Have followed your STR exactly, and tried other combinations as well. My best advice at this point is to double-check all your assumptions about which test-validator binary you're running, which CLI binary you're calling, and which RPC endpoint your CLI is targeting.

There was a bug in TestValidator preventing quic support in some of the early v1.11 versions, but it was fixed in https://github.com/solana-labs/solana/pull/27046 which was released in v1.11.6. So there are some combinations of cli-TestValidator that wouldn't work due to this bug, but test validators v1.11.6+ (like v1.14.3) should work with all clients.

mschneider commented 1 year ago

Hi @CriesofCarrots i noticed that you are not using the cli compiled from v1.14.3, but v1.11.3. I tried to reproduce this issue on a m1 mac and got the same behaviour as @godmodegalactus

I started both programs from the checkout on branch v1.14.3 using cargo:

$ cargo run --bin solana-test-validator
    Finished dev [unoptimized + debuginfo] target(s) in 0.40s
     Running `target/debug/solana-test-validator`
Ledger location: test-ledger
Log: test-ledger/validator.log
⠒ Initializing...
Identity: 4eJc8GmnMJoSZBTN29Dt9TZoxwMHp8G29ehCMBj2o4pG
Genesis Hash: 39bvvKwWxCw16B6EoAyNbbqkUWTmxCTRRZnznL1zbNUG
Version: 1.14.3
Shred Version: 63095
Gossip Address: 127.0.0.1:1024
TPU Address: 127.0.0.1:1027
JSON RPC URL: http://127.0.0.1:8899
⠒ 00:03:14 | Processed Slot: 392 | Confirmed Slot: 392 | Finalized Slot: 360 | Full Snapshot Slot: 300 | Incremental Snapshot Slot: - | Transactions: 391 | ◎
$ cargo run --bin solana -- -V
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/solana -V`
solana-cli 1.14.3 (src:devbuild; feat:940802714)
$ cargo run --bin solana -- balance -ul
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/solana balance -ul`
500000000 SOL
$ cargo run --bin solana -- airdrop -ul 1
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/solana airdrop -ul 1`
Requesting airdrop of 1 SOL
Error: error sending request for url (http://localhost:8899/): error trying to connect: tcp connect error: Connection refused (os error 61)
mschneider commented 1 year ago

also tried to check if i can spot the TPU port in netstat but it seems to not be open. I do see other ports in that range open though:

udp4       0      0  *.1029                                        *.*                                                      
udp4       0      0  *.1026                                        *.*                                                      
udp4       0      0  *.1025                                        *.*                                                      
udp4       0      0  *.1024                                        *.* 
CriesofCarrots commented 1 year ago

Hmm, well, I had already tried v1.14.3 cli as well:

$ cd solana
$ git branch
* (HEAD detached at v1.14.3)

$ cargo build
$ ./target/debug/solana -V
solana-cli 1.14.3 (src:devbuild; feat:940802714)
$ ./target/debug/solana-test-validator -V
solana-test-validator 1.14.3 (src:devbuild; feat:940802714)

$ ./target/debug/solana-test-validator --reset

Notice! No wallet available. `solana airdrop` localnet SOL after creating one

Ledger location: test-ledger
Log: test-ledger/validator.log
⠠ Initializing...
Identity: AWUzLzGW9Bxd3BvoYgu4ZZu787eaKLTk7ibmGAiRb22f
Genesis Hash: 8m6nV1hjkjxKodmqLQyJcKVDxKaVxTAweGyk4cVLCCgA
Version: 1.14.3
Shred Version: 1397
Gossip Address: 127.0.0.1:1024
TPU Address: 127.0.0.1:1027
JSON RPC URL: http://127.0.0.1:8899
...
$ ./target/debug/solana airdrop 10 -ul -k ~/identity.json
Requesting airdrop of 10 SOL

Signature: 5WJJk5998m9QXivyJviS1qAp51tVBBTKUkjgvf5QDNdsHXyssgNcFqGF2KDXscdaXyWMB2NRKGGLU9vaJR1DwwHg

10 SOL

Using prebuilt bins = same success.

mschneider commented 1 year ago

yeah all three of us followed the exact same steps. still different results. i can also confirm that everything works if i switch to master. just not on v1.14.3

godmodegalactus commented 1 year ago

@CriesofCarrots May be it is related to some config. Could you try on a fresh install like on a VM or another machine to reproduce the issue.

godmodegalactus commented 1 year ago

After checking into validator logs I have following logs :

[2022-10-02T20:24:00.318208177Z INFO  solana_core::validator] ContactInfo { id: CnuWn3GhoQC8wsfVKUmETzX6nrvZaGnAZ5ycard35HAw, gossip: 127.0.0.1:1024, tvu: 127.0.0.1:1025, tvu_forwards: 127.0.0.1:1026, repair: 127.0.0.1:1031, tpu: 127.0.0.1:1027, tpu_forwards: 127.0.0.1:1028, tpu_vote: 127.0.0.1:1029, rpc: 127.0.0.1:8899, rpc_pubsub: 127.0.0.1:8900, serve_repair: 127.0.0.1:1032, wallclock: 1664742240314, shred_version: 61528 }

So tpu is : 127.0.0.1:1027

But when TPU client is trying it tries to connect to another port : [2022-10-02T20:24:01.438554026Z INFO solana_tpu_client::nonblocking::quic_client] Made connection to 127.0.0.1:1033 id 140151359788848 try_count 0

And I get following error : Cannot make connection to 127.0.0.1:1033, error timed out

CriesofCarrots commented 1 year ago

The node's quic port is configured as tpu+6, hence 1033, so the client is using the correct port. Is that port being blocked on your machine for some reason?

godmodegalactus commented 1 year ago

The node's quic port is configured as tpu+6, hence 1033, so the client is using the correct port. Is that port being blocked on your machine for some reason?

Ok good to know. I will debug more to understand whats going on.

godmodegalactus commented 1 year ago

Hey @CriesofCarrots @mschneider ,

After lot of debugging and testing i found the cause. It is the rust compiler, quinn 0.8.3 is no more compatible with latest stable and nightly. When i installed rustc 1.60.0 and compiled solana and it works as expected.

To solve the issue I compiled repo of quinn and the basic example was not working. With version quinn 0.8.4 examples were working but the cargo test did not passed. All the tests had timeout errors. With main branch it works but is not compatible with solana.

I don't know when does compatibility breaks but then its between 1.60.0 and latest stable. @CriesofCarrots : I dont know what to do in this case but I guess we have to tell everyone to compile it with a specific version of rust so that they do not run into same problems as me.

May be just updating the Readme.md file and specifying a version that works will do.

CriesofCarrots commented 1 year ago

Aha, mystery solved. Thanks for digging!

We do actually define the supported rust versions (stable and nightly) for each branch of the repo in this file: https://github.com/solana-labs/solana/blob/master/ci/rust-version.sh And you can use the supported version by default by running the ./cargo script in the repo's top level in place of your linked cargo: https://github.com/solana-labs/solana/blob/master/cargo So: ./cargo build, ./cargo test, etc.

I'm going to close this issue, but feel free to re-open or open an new one if you still see issues using the supported rust versions. And PRs welcome if you have thoughts about how to improve rust-version documentation.

godmodegalactus commented 1 year ago

Ok i understand it is better to do ./cargo build and ./cargo test.

Henry-E commented 1 year ago

The rust version has been replaced with an environment variable RUST_STABLE_VERSION which isn't defined anywhere in the repo. stable_version="$RUST_STABLE_VERSION"