helium / miner

Miner for the helium blockchain
Apache License 2.0
608 stars 266 forks source link

Failed to dial peer and drops packet #472

Closed Rylan12 closed 3 years ago

Rylan12 commented 4 years ago

Description

I'm using a DIY gateway with a miner hosted in an AWS EC2 instance to try to send data to the Helium network. Some packets are being dropped in between the miner and reaching the Helium console (I'd estimate that 1 out of every ~5-10 packets is dropped). I have tracked several dropped packets and they appear to move successfully from the sensor, through the packet forwarder, and to the miner, where it is dropped with the error messages failed to dial "/p2p/112qB3YaH5bZkCnKA5uRH7tBtGNv2Y5B4smv1jsmvGUzgKT71QpE" and failed to dial 1: failed dropping 2 packets.

I've included the logs from the miner during a dropped packet below. The payload being sent is 3 bytes (not including the Helium header) and is being sent every 2 minutes (for testing purposes). The issue also happens at higher intervals between packets (I know that I've seen the issue at 5 minute intervals as well).

2020-08-04 18:33:37.198 [info] <0.1321.0>@miner_lora:handle_udp_packet:380 PUSH_DATA [{<<"rxpk">>,[[{<<"tmst">>,676026522},{<<"time">>,<<"2020-08-04T18:33:36.169971Z">>},{<<"tmms">>,1280601235170},{<<"chan">>,8},{<<"rfch">>,0},{<<"freq">>
,904.6},{<<"stat">>,1},{<<"modu">>,<<"LORA">>},{<<"datr">>,<<"SF8BW500">>},{<<"codr">>,<<"4/5">>},{<<"lsnr">>,10.0},{<<"rssi">>,-52},{<<"size">>,16},{<<"data">>,<<"QAAAAEiAGgABQnoKS4pRkg==">>}]]}] from 12273815315514654977 on 59565
2020-08-04 18:33:37.198 [notice] <0.1321.0>@miner_lora:handle_packets:524 Routing {devaddr,1207959552}
2020-08-04 18:33:37.199 [info] <0.1316.0>@blockchain_state_channels_client:handle_packet:297 handle_packet {packet_pb,0,lorawan,<<64,0,0,0,72,128,26,0,1,66,122,10,75,138,81,146>>,676026522,-52,904.6,<<"SF8BW500">>,10.0,{routing_informatio
n_pb,{devaddr,1207959552}}} to [{routing_v1,1,<<1,125,102,101,5,171,157,180,249,144,239,214,128,120,131,69,171,50,195,32,115,252,39,208,164,32,152,211,182,230,216,194,136>>,[<<0,241,20,68,146,24,117,226,239,116,53,81,58,29,31,27,15,164,15
8,50,66,149,106,36,56,57,18,236,93,79,25,64,119>>],[<<193,92,2,137,236,45,10,145,10,0,0,0,0,0,0,0,0,0,112,34,183,127,0,0,0,0,0,0,0,0,0,0,72,236,132,14,0,112,0,0,0,0,0,0,0,0,0,0,1,0,0,0,48,101,0,0,0,0,0,0,0,0,0,0,49,0,0,0,54,0,0,0,58,0,0,0
>>],[<<0,0,0,127,255,0>>],0}]
2020-08-04 18:33:37.226 [error] <0.19758.9>@blockchain_state_channels_client:dial:518 failed to dial "/p2p/112qB3YaH5bZkCnKA5uRH7tBtGNv2Y5B4smv1jsmvGUzgKT71QpE":{exit,{normal,{gen_server,call,[<0.12516.9>,open]}}}
2020-08-04 18:33:37.227 [error] <0.1316.0>@blockchain_state_channels_client:handle_info:180 failed to dial 1: failed dropping 2 packets

It appears that the miner is trying to connect to a p2p address but the connection is failing. Instead of trying again with a different peer (as I would expect), the miner "gives up" and drops the packet.

Info

vihu commented 3 years ago

Thanks for the report @Rylan12

We are trying to rework a portion of the state channels and packet handling protocol in blockchain-core#602 that would in theory make sending packets more robust. Although I should mention that dialing p2p streams over the internet is inherently a little finicky, there are multiple reasons why a dial may fail (and these are beyond our control). However, the work being done to address some of it in blockchain-core should make it more consistent :crossed_fingers:

Please take a look at that PR and check if things make sense, we'd appreciate any help!

evanmcc commented 3 years ago

Going to close this as stale. It's something that we're working on, but there isn't a single actionable thing to do here.