Closed JasonCoombs closed 5 years ago
Thinking it would be useful to be able to activate a mode in which ONLY blacklisted peers are allowed, for testing and debugging purposes. Does the Bitcoin community operate any known-malicious peers to help detect bugs in new implementations and for regression testing?
My current list, some of these peers might have caused segmentation faults:
# IP address to disallow as a peer, multiple entries allowed.
blacklist = 62.182.124.202
blacklist = 51.15.113.51
blacklist = 213.207.92.204
blacklist = 107.167.186.147
blacklist = 108.44.193.95
blacklist = 81.169.162.150
blacklist = 142.93.239.160
blacklist = 5.102.147.62
blacklist = 178.128.85.239
blacklist = 144.76.71.57
blacklist = 167.99.219.177
blacklist = 95.216.23.174
# segmentation fault during block sync
blacklist = 52.243.61.218
blacklist = 195.201.110.218
blacklist = 13.124.0.147
Do we have a way to determine the software and version number of particular peers?
Like a method for nmap to use to discover software version with a pattern signature.
https://nmap.org/book/man-version-detection.html https://nmap.org/book/vscan.html
Shall we add support in libbitcoin for remote version detection?
Using https://nmap.org/nsedoc/scripts/bitcoin-info.html with https://nmap.org/nsedoc/lib/bitcoin.html
nmap already supports bitcoin user agent remote discovery, but it isn't working for Testnet nodes.
$ nmap -n -p 8333 --script bitcoin-info 75.40.139.202
Starting Nmap 7.70 ( https://nmap.org ) at 2018-11-05 13:51 HST
Nmap scan report for 75.40.139.202
Host is up (0.12s latency).
PORT STATE SERVICE
8333/tcp open bitcoin
| bitcoin-info:
| Timestamp: 2018-11-05T23:51:05
| Network: main
| Version: 0.7.0
| Node Id: 72c8d9b0b72850b8
| Lastblock: 548938
|_ User Agent: /Satoshi:0.17.99/
Nmap done: 1 IP address (1 host up) scanned in 0.77 seconds
If there is a fault it’s not with the peer, it’s with libbitcoin.
it will help to know which software the peer is running, so the error condition might be forced when debugging.
I see from the console output that the node in question is modified. I assume this pertains to logging only, but it may still be problematic. I synced and validated all of mainnet and have encountered no stability problems.
Shall we add support in libbitcoin for remote version detection?
This is native to the Bitcoin p2p protocol, and already logged.
What makes you think this is related to peer behavior?
There's clearly something different about the way certain peers communicate that causes a crash immediately after connecting to them. Blacklist the peer and the problem goes away, until a similar peer is encountered.
That doesn't imply correlation with the peer. It could be entirely unrelated and the same might result, since the peer may or may not even be selected again. Maybe if you are seeing it repeatedly happen after the same peer connects, but otherwise I wouldn't make any such assumption.
A better way to test this theory is to simply connect only to the presumed problem peer(s). You can do this using manual connections (though that does introduce a small distinction in the code path, though the same is true of adding peers to the blacklist).
I tried using nmap service discovery with the Nmap Scripting Engine on my local libbitcoin-node but it isn't reporting any details. compare to https://github.com/libbitcoin/libbitcoin-node/issues/392#issuecomment-436049297
$ nmap -n -p 8333 --script bitcoin-info 127.0.0.1
Starting Nmap 7.70 ( https://nmap.org ) at 2018-11-05 13:57 HST
Nmap scan report for 127.0.0.1
Host is up (0.00020s latency).
PORT STATE SERVICE
8333/tcp open bitcoin
Nmap done: 1 IP address (1 host up) scanned in 10.31 seconds
That doesn't imply correlation with the peer.
I started adding peers to my blacklist when I did see a correlation, it was the same IP each time it crashed. Without the blacklist there's no way my Testnet node could have sync'ed current.
A better way to test this theory is to simply connect only to the problem peer(s). You can do this using manual connections
can you point me to the documentation for initiating a manual connection? thanks.
That’s a feature not a bug.
That’s a feature not a bug.
you mean the inability of nmap to remote-discover that we are libbitcoin
It could simply be reorganization from weaker chains. This was failing until I fixed it a couple days ago, so make sure you have latest.
Shall we add support in libbitcoin for remote version detection?
This is native to the Bitcoin p2p protocol, and already logged.
I meant user agent, not protocol version. do we not want to be discoverable by nmap for some reason?
User agent is part of the P2P version message.
Why on earth would we want to add code to advertise information to port scanners.
because security by obscurity is not security
You missed the point. This is just dead code, which has a cost, and, as all code, does translate into increased attack surface.
if the user agent is already supposed to be transmitted as part of the P2P version message, do you have a clue why nmap isn't able to process ours?
Sounds like a fun research project.
oh, interesting, it worked when nmap was the first and only inbound or outbound connection:
PORT STATE SERVICE REASON
8333/tcp open bitcoin syn-ack
| bitcoin-info:
| Timestamp: 2018-11-06T00:12:39
| Network: main
| Version: 0.7.0
| Node Id: e8fad242281281e2
| Lastblock: 42561
|_ User Agent: /libbitcoin:4.0.0/
Final times for host: srtt: 149 rttvar: 3772 to: 100000
Then it's likely the failure to respond quickly that gets it terminated by the batch session.
A port scanner is not a peer, and we have no interest in allowing it to hold the port while there is a more responsive peer.
Have you determined that you have current code in all repos?
A port scanner is not a peer, and we have no interest in allowing it to hold the port while there is a more responsive peer.
there's no way for us to know that nmap is the peer at the point that the peer first establishes a new inbound connection, and even if we did want to prevent nmap from scanning our nodes in my view we should not do so because it's not well-behaved network application design. people need to be able to use network admin tools to admin networks. plus, security through obscurity is not even something that Microsoft believes is valid any longer, so being hidden on the network is self-deception.
Have you determined that you have current code in all repos?
I've been carefully reproducing any problems that I encounter in my development branch using vanilla copies of everything current in master from all of the repos. the build problem was caused by changes I've made in my development branch, where I do NOT have the latest code from all repos.
When in doubt my second step is always to reproduce with the vanilla libbitcoin code. First step is to bother you again expecting you to fix it for me. ;-)
Isolating (and even fixing) problems on your own when you may have that option is much better than generating conversation and work for others like myself who try to remain responsive.
I realize that my style is cluttering up your GitHub. But there's nobody else here. Maybe being a little LESS responsive will encourage more people to participate. Every other open source project in the entire world, every one of them, usually has work in progress and open issues dating back months or years. Open issues show other people how they might be helpful. You do want other people to help, even if it seems like a waste of your time right now.
Discussions are on slack. There are few issues here because we work hard to manage them. A large number of open issues makes it impossible to evaluate the state of a project. If the issues are meaningful, or the discussion helpful, then they are of course welcome. But this isn't a discussion board. Please use the slack channel for casual discussion about passing thoughts and reserve this space for concise useful information that should be archived.
open issues dating back months or years.
Not here. I try very hard to limit open issues to one page in the browser, in each repo. Similarly pull requests are not allowed to linger for long periods of time.
Please reopen if this is happening with unmodified code. I am not seeing it with current code.
Thanks, the problem was here:
and here:
https://github.com/libbitcoin/libbitcoin-blockchain/blob/master/src/populate/populate_header.cpp#L48
where my debugging of a null pointer previously needed to be removed now that (hopefully) there are no longer validate header and populate header events being triggered with either missing metadata or null branch pointer. continuing to test this.
only certain peers appear to send headers when blocks are requested, so blacklisting had the effect of avoiding causing either validate header or populate header from being invoked. I managed to sync the entire Testnet with v4 without once encountering unsolicited headers apparently because most peers give us headers only when we ask for them.
Discussions are on slack.
I don't agree with that policy. Slack is not searchable the way GitHub is. Don't discuss things here if you don't want to, but other people might appreciate more technical detail here where it is more accessible, especially developers new to the project who need to see step by step analysis of how the pieces all fit together. My participation here will diminish soon, so there's no need to tell me to leave.
After additional testing and analysis of my easily-reproducible segmentation faults using my custom debugging output on my development branch, as well as these Issues: https://github.com/libbitcoin/libbitcoin-node/issues/396 and https://github.com/libbitcoin/libbitcoin-node/issues/397 it is my opinion that there is in fact an unresolved segmentation fault problem.
I believe the segmentation fault is occurring only when an ASIO callback function is triggered in an unexpected order sequence, BEFORE the network request or database/blockchain read has occurred through which the metadata state of a newly-received header is supposed to be populated.
Somehow, missing metadata for the header is resulting in a null pointer during one of these calls:
or
https://github.com/libbitcoin/libbitcoin-blockchain/blob/master/src/populate/populate_header.cpp#L48
either on those very lines (attempting to dereference a null pointer) or subsequent to them when the handler is invoked (calling a null pointer, if for some reason the handler function pointer is invalid).
this problem is most likely a result of incorrect use of ASIO, where we improperly presume that ASIO callback functions will be dispatched in a FIFO queue that doesn't exist. the ASIO documentation says that we cannot rely on ASIO to dispatch functions in any particular order, and we are therefore supposed to use a mechanism known as a "strand" which is described here:
https://www.boost.org/doc/libs/1_68_0/doc/html/boost_asio/overview/core/strands.html http://www.crazygaze.com/blog/2016/03/17/how-strands-work-and-why-you-should-use-them/ https://stackoverflow.com/questions/39097644/how-strands-guarantee-correct-execution-of-pending-events-in-boost-asio
the resubscriber utility attempts to use a strand:
but it seems likely that this is not being used in every place that we have result handlers queued. do we have multiple strands, one per I/O threadpool, and if so then how are we currently enforcing order?
I am having trouble reproducing the segmentation fault without my extensive debug logging calls inserted in the processing, and I believe that my debug logging is changing the timing in such a way that the race condition that exists somewhere more easily reveals itself by dispatching handlers in an order that leaves something uninitialized and attempting to invoke or access a null pointer is the result.
While there certainly may be problems, the fact that, based on your comments above you really don’t understand the design, and that it only reproduces in your customized version, I’m inclined to assume it’s your code vs. goin off to chase phantoms.
I mentioned this here https://github.com/libbitcoin/libbitcoin-node/issues/383#issuecomment-433738651
The segmentation faults appear to result from some errant or unexpected behavior of particular peers.