stacks-network / stacks-core

The Stacks blockchain implementation
https://docs.stacks.co
GNU General Public License v3.0
3k stars 659 forks source link

Node is doing discovery on private subnets #3429

Closed pseudozach closed 1 year ago

pseudozach commented 1 year ago

Hi, I'm running a node for lnswap and the infrastructure provider is complaining that node is doing discovery on private subnets. is this normal? can it be avoided? is there config switch I can use to stop this traffic?

ps: I've tried using firewall unsuccessfully, would rather solve the root cause.

##########################################################################

Netscan detected from host xxx.23

########################################################################## time protocol src_ip src_port dest_ip dest_port Sat Dec 3 09:24:40 2022 TCP xxx.23 36814 => 10.0.1.12 20444 Sat Dec 3 09:24:47 2022 TCP xxx.23 36814 => 10.0.1.12 20444 Sat Dec 3 09:28:12 2022 TCP xxx.23 43576 => 10.0.3.91 20444 Sat Dec 3 09:28:12 2022 TCP xxx.23 50020 => 10.0.6.7 20444 Sat Dec 3 09:24:40 2022 TCP xxx.23 51800 => 10.0.6.11 20444 Sat Dec 3 09:24:47 2022 TCP xxx.23 51800 => 10.0.6.11 20444

jcnelson commented 1 year ago

I'm running a node for lnswap and the infrastructure provider is complaining that node is doing discovery on private subnets. is this normal? can it be avoided? is there config switch I can use to stop this traffic?

Can you provide some debug logs and your config file (with any private keys omitted)?

wileyj commented 1 year ago

Hi, I'm running a node for lnswap and the infrastructure provider is complaining that node is doing discovery on private subnets. is this normal? can it be avoided? is there config switch I can use to stop this traffic?

ps: I've tried using firewall unsuccessfully, would rather solve the root cause.

##########################################################################

Netscan detected from host xxx.23

########################################################################## time protocol src_ip src_port dest_ip dest_port Sat Dec 3 09:24:40 2022 TCP xxx.23 36814 => 10.0.1.12 20444 Sat Dec 3 09:24:47 2022 TCP xxx.23 36814 => 10.0.1.12 20444 Sat Dec 3 09:28:12 2022 TCP xxx.23 43576 => 10.0.3.91 20444 Sat Dec 3 09:28:12 2022 TCP xxx.23 50020 => 10.0.6.7 20444 Sat Dec 3 09:24:40 2022 TCP xxx.23 51800 => 10.0.6.11 20444 Sat Dec 3 09:24:47 2022 TCP xxx.23 51800 => 10.0.6.11 20444

Can you share details on the firewall setup that you tried? i don't think we should change the blockchain code to only try to discover nodes over the public internet (in some cases, it's advantageous to also look over private networks).

pseudozach commented 1 year ago

I'm using https://github.com/stacks-network/stacks-blockchain-docker with default config

this is my .env file

# VERBOSE=true
###############################
## Stacks Blockchain API
##
NODE_ENV=production
GIT_TAG=master
PG_HOST=postgres
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=xxx
PG_DATABASE=postgres
STACKS_CHAIN_ID=2147483648
V2_POX_MIN_AMOUNT_USTX=90000000260
STACKS_CORE_EVENT_PORT=3700
STACKS_CORE_EVENT_HOST=0.0.0.0
STACKS_BLOCKCHAIN_API_PORT=3999
STACKS_BLOCKCHAIN_API_HOST=0.0.0.0
STACKS_BLOCKCHAIN_API_DB=pg
STACKS_CORE_RPC_HOST=stacks-blockchain
STACKS_CORE_RPC_PORT=20443
STACKS_EXPORT_EVENTS_FILE=/tmp/event-replay/stacks-node-events.tsv
# STACKS_API_ENABLE_FT_METADATA=1
# STACKS_API_ENABLE_NFT_METADATA=1
# STACKS_API_TOKEN_METADATA_ERROR_MODE=warning
# STACKS_ADDRESS_CACHE_SIZE=10000
#BNS_IMPORT_DIR=/bns-data

###############################
## Stacks Blockchain
##
RUST_BACKTRACE=full
STACKS_LOG_DEBUG=0
# STACKS_LOG_JSON=1
# STACKS_EVENT_OBSERVER=stacks-blockchain-api:3700
##
## How long to wait for stacks-blockchain event loop to stop (default is 20 minutes)
##    The event-observer run loop can take quite a long time to stop processing
##    ** if you kill this thread before it's done, it can cause a corrupt chainstate **
STACKS_SHUTDOWN_TIMEOUT=1200

###############################
## Nginx proxy
## 
NGINX_PROXY_PORT=80

###############################
## Docker image versions
## 
STACKS_BLOCKCHAIN_VERSION=2.05.0.5.0
STACKS_BLOCKCHAIN_API_VERSION=3.0.4
# version of the postgres image to use (if there is existing data, set to this to version 13)
# if starting a new sync from genesis, can use any version > 13
POSTGRES_VERSION=14

here's my ufw config

root@lnswap2 scripts ₿ ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] 22/tcp                     ALLOW IN    Anywhere                  
[ 2] 3888                       ALLOW IN    Anywhere                  
[ 3] 8888                       ALLOW IN    Anywhere                  
[ 4] 8332                       ALLOW IN    Anywhere                  
[ 5] 8333                       ALLOW IN    Anywhere                  
[ 6] 9735                       ALLOW IN    Anywhere                  
[ 7] 20443                      ALLOW IN    Anywhere                  
[ 8] 20444                      ALLOW IN    Anywhere                  
[ 9] 9007                       ALLOW IN    Anywhere                  
[10] 9008                       ALLOW IN    Anywhere                  
[11] 80                         ALLOW IN    Anywhere                  
[12] 3000                       ALLOW IN    Anywhere                  
[13] 9200                       ALLOW IN    Anywhere                  
[14] 9300                       ALLOW IN    Anywhere                  
[15] 5601                       ALLOW IN    Anywhere                  
[16] 172.25.0.0/16              DENY OUT    Anywhere                   (out)
[17] 192.168.0.0/16             DENY OUT    Anywhere                   (out)
[18] 10.0.0.0/8                 DENY OUT    Anywhere                   (out)
[19] 35.199.6.155               DENY OUT    Anywhere                   (out) 
[20] 22/tcp (v6)                ALLOW IN    Anywhere (v6)             
[21] 3888 (v6)                  ALLOW IN    Anywhere (v6)             
[22] 8888 (v6)                  ALLOW IN    Anywhere (v6)             
[23] 8332 (v6)                  ALLOW IN    Anywhere (v6)             
[24] 8333 (v6)                  ALLOW IN    Anywhere (v6)             
[25] 9735 (v6)                  ALLOW IN    Anywhere (v6)             
[26] 20443 (v6)                 ALLOW IN    Anywhere (v6)             
[27] 20444 (v6)                 ALLOW IN    Anywhere (v6)             
[28] 9007 (v6)                  ALLOW IN    Anywhere (v6)             
[29] 9008 (v6)                  ALLOW IN    Anywhere (v6)             
[30] 80 (v6)                    ALLOW IN    Anywhere (v6)             
[31] 3000 (v6)                  ALLOW IN    Anywhere (v6)             
[32] 9200 (v6)                  ALLOW IN    Anywhere (v6)             
[33] 9300 (v6)                  ALLOW IN    Anywhere (v6)             
[34] 5601 (v6)                  ALLOW IN    Anywhere (v6)   

This line I put for testing [19] 35.199.6.155 DENY OUT Anywhere (out) and I've collected pcap on the egress interface, I still see TCP traffic leaving the NIC with this IP as destination and port 20444.

When I export the logs I don't see any of these P2P traffic requests in the logs but I'm not sure if these node discovery packets (possibly?) would be logged at all. Any way to enable these?

wileyj commented 1 year ago

hmm, the only way that might tell us something is to modify the command line /args for the blockchain so it's DEBUG logging. in your .env STACKS_LOG_DEBUG=1 should do it with a restart.

the ordering of your rules and the rules look correct, but i'm not entirely familiar with how ufw works so i'll have to read the docs. i may try to reproduce this myself as well.

one other thing to note - there was new version released, so you'll want to update this: STACKS_BLOCKCHAIN_VERSION=2.05.0.5.0 to STACKS_BLOCKCHAIN_VERSION=2.05.0.6.0

wileyj commented 1 year ago

I have a feeling this is somehow related to how docker (used by the repo you're running) modifies iptables. https://github.com/chaifeng/ufw-docker

i'll see what happens when i attempt to reproduce it, but the reasoning in this article makes sense with a cursory glance.

pseudozach commented 1 year ago

I have a feeling this is somehow related to how docker (used by the repo you're running) modifies iptables. https://github.com/chaifeng/ufw-docker

i'll see what happens when i attempt to reproduce it, but the reasoning in this article makes sense with a cursory glance.

This is definitely the right path, I've been googling for months but didn't find any of these resources. Although the config changes I need would be different because I want node and api to be reachable externally but I need private subnet traffic to not leave the VM.

Let me know what you come up with, I'll also try some things.

wileyj commented 1 year ago

I have a feeling this is somehow related to how docker (used by the repo you're running) modifies iptables. https://github.com/chaifeng/ufw-docker i'll see what happens when i attempt to reproduce it, but the reasoning in this article makes sense with a cursory glance.

This is definitely the right path, I've been googling for months but didn't find any of these resources. Although the config changes I need would be different because I want node and api to be reachable externally but I need private subnet traffic to not leave the VM.

Let me know what you come up with, I'll also try some things.

For the VM itself, do you have any "external" firewalls in place? For example, if using AWS this would be their security group EC2 service.

pseudozach commented 1 year ago

no, it's a debian VM and I'm not aware of any other external firewalls. What I'm told is that, once traffic leaves my VM it hits their router which is where they do this netscan and detect the erroneous p2p traffic.

wileyj commented 1 year ago

Been testing out some options tonight, i think i've found something easy that'll work:

Test setup

Steps

  1. Test connection from testing host->external host over 20443 and 20444:
    # from testing host to external host on 20443
    $ nc -vz -w 10 10.0.102.244 20443
    external [10.0.102.244] 20443 (?) open
    # from testing host to external host on 20444
    $ nc -vz -w 10 10.0.102.244 20444
    external [10.0.102.244] 20444 (?) open
    # curl external host /v2/info on 20443
    $ curl -sL 10.0.102.244:20443/v2/info | jq .stacks_tip_height
    86289
  2. stop all containers, i.e. ./manage.sh -n mainnet -a stop
  3. edit docker opts:
    sudo bash -c 'echo DOCKER_OPTS=\"--iptables=false\" >> /etc/default/docker' 
  4. restart docker:
    systemctl restart docker
  5. restart containers, i.e. ./manage.sh -n mainnet -a start -f proxy
  6. deny outbound connections on 20444 and enable ufw
    sudo ufw deny out 20444/tcp
    sudo ufw enable
  7. retest connection from step 1:
    
    $ sudo ufw status
    Status: active

To Action From


20444/tcp DENY OUT Anywhere
20444/tcp (v6) DENY OUT Anywhere (v6)

from testing host to external host on 20443

$ nc -vz -w 10 10.0.102.244 20443 external [10.0.102.244] 20443 (?) open

from testing host to external host on 20444

$ nc -vz -w 10 10.0.102.244 20444 external [10.0.102.244] 20444 (?) : Connection timed out

from testing host to itself on 20444

$ nc -vz -w 10 localhost 20444 localhost [127.0.0.1] 20444 (?) open

curl external host /v2/info on 20443

$ curl -sL 10.0.102.244:20443/v2/info | jq .stacks_tip_height 86289



based on the setup you've shared, you may need to tweak your firewall rules a little bit or adjust them so the ports you need open still are, but i'm hopeful this will resolve the issue with your provider. 
wileyj commented 1 year ago

ahh nuts. disregard the above - i'm still able to hit my external host from the container itself. my mistake was testing the connection from the VM only. I'll keep poking at this

pseudozach commented 1 year ago

thanks to your input I think I'm able to accomplish what I need: I let a packet capture run on the VM: tcpdump -i enp35s0 -n dst port 20444 | grep 35.199.6.155 -> just a sample node IP that has active traffic

When I enter this rule, traffic does not leave the VM anymore: iptables -I FORWARD -d 35.199.6.155 -p tcp --dport 20444 -j DROP

so I applied the following firewall rules, will monitor a few days. hopefully issue won't come back and I'll close this issue.

iptables -I FORWARD -d 10.0.0.0/8 -p tcp --dport 20444 -j DROP iptables -I FORWARD -d 172.25.0.0/16 -p tcp --dport 20444 -j DROP iptables -I FORWARD -d 192.168.0.0/16 -p tcp --dport 20444 -j DROP

Thanks again!

wileyj commented 1 year ago

The one thing to consider here is that the rules won't persist on reboot. There are some ways to accomplish this, but in my testing the easy way isn't possible because it reorders the list of the saved rules (i.e. the DROP rule is lower in the chain, so it's overridden).

add iptables rules

sudo iptables -I FORWARD -p tcp --dport 20444 -j DROP  # block outgoing 20444 from container
sudo iptables -A OUTPUT -p tcp --dport 20444 -j DROP   # block outgoing 20444 from VM

delete rules

sudo iptables -D FORWARD -p tcp --dport 20444 -j DROP # block outgoing 20444 from container
sudo iptables -D OUTPUT -p tcp --dport 20444 -j DROP  # block outgoing 20444 from VM
sudo apt install iptables-persistent         # install tool to keep iptables rules persistent across reboots
sudo iptables-save -f /etc/iptables/rules.v4 # save current rules
sudo systemctl enable netfilter-persistent   # enable reloading rules on reboot

manual reload

sudo netfilter-persistent reload

normally netfilter-persistent should reload all the rules, but in my testing the order was incorrect so i had to add a startup script to my VM to load the rules manually on reboot.

with the DROP rules enabled (like in your previous reply), this will drop all outgoing on port 20444 from both containers and the VM

wileyj commented 1 year ago

Closing this - seems like we have a solution. reopen if need some more help on this though