miguelfreitas / twister-core

twister core / daemon
MIT License
1.42k stars 251 forks source link

Not all posts appear on Postboard #82

Open BlockTester opened 10 years ago

BlockTester commented 10 years ago

This is a clone thread of https://github.com/miguelfreitas/twister-html/issues/38 as it appears to belong in twister-core.

I follow more than 50 users and it's clear that 95% of them never make it to my postboard, even when they have new posts. All posts are displayed correctly when I bring up the profile for each user, but don't appear in the Postboard.

I suspect it's the same issue in following.html: "Downloaded n/n posts" does not match what you see when you bring up the modal for the same user.

This is a problem in core. I suspect there is still some corner case that libtorrent doesn't synchronize properly. Sometimes just restarting the daemon may fix, but when it doesn't we will need to debug why torrent is failing to obtain new pieces. Unfortunately debuging a single torrent (so the others which work don't disturb you) is very tedious.

toyg commented 10 years ago

+:100:

BlockTester commented 10 years ago

I'm getting nearly no posts at all now, it's really disheartening :/

BlockTester commented 10 years ago

I have no idea if it is at all related but I have a ton of these messages:

01:00:18.369: 286ef0: AUTO MANAGER PAUSING TORRENT
01:00:18.369: 0e70ad: AUTO MANAGER PAUSING TORRENT
BlockTester commented 10 years ago

Here's the result of getlasthave if it's any help:

"lorea" : 0,
"mfreitas" : 39,
"toyg" : -1,

getnumpieces:

"lorea" : 1,
"mfreitas" : 40,
"toyg" : 0,
BlockTester commented 10 years ago
$ ./twisterd getposts 1 '[{"username":"toyg"}]'
[
]

However:

$ ./twisterd getposts 1 '[{"username":"mfreitas"}]'
[
    {
        # CONTAINS A POST
    }
}
miguelfreitas commented 10 years ago

This this one of the two difficult to debug issues i'd in mind when twister was made public. The other i fixed yesterday, yes! :-) how i'd love if someone with better knowledge of torrent protocol and/or libtorrent volunteered to dive into it so didn't had to...

@BlockTester, the problem above is toyg with -1 lasthave, this is obviously wrong. So one would need to understand why even after your client connects to other peers in the toyg's swarm (i'm assuming it does), it still fails to download any single piece.

This is surely caused by the change i've made to bittorrent protocol, so to support increasing the number of pieces at any time. There is certainly some aspect i haven't considered... bittorrent protocol is tricky will all that choking/unchoking logic, pieces that are excluded from choking etc.

So this is actually difficult to debug because you may reach the same code path to serve different active torrents, and you are interested in debugging just one. Think of gdb breakpoints. Perhaps patching twister.cpp to not start any except the problematic one may help... well, some strategy will have to be used.

BlockTester commented 10 years ago

@miguelfreitas is there any extra debug information I could provide that could be useful?

BlockTester commented 10 years ago

I compiled libtorrent with --enable-statistics --enable-disk-stats in hopes that there would be more statistics, available, sadly the disk stats are essentially empty. I do get buffer statistics, but obviously I was aiming for more data on the dht and torrent status... I noticed in the tool there are utilities pertaining to that. Any idea how this is enabled?

send_buffer

BlockTester commented 10 years ago

Probably not relevant but nevertheless interesting while I still have statistics enabled ;)

disk_buffer

disk_io

disk_throughput

BlockTester commented 10 years ago

Oh some lsof magic lead me to the session stats file. Seems a little bit more useful!

BlockTester commented 10 years ago

Well well, I found where I parked my car!

With --enable-statistics you can run the following:

cd /tmp
/opt/src/twister-core/libtorrent/tools/parse_session_stats.py /TWISTER/RUN/PATH/session_stats/4591.0000.log

Which will produce a report (with an html page to view it) in /tmp/session_stats_report

Here's what mine looks like http://imgur.com/a/IJv4j#0

BlockTester commented 10 years ago

Maybe we need to consider ditching the automanager?

BlockTester commented 10 years ago

Do we have any leads on this? I still only get a handful of posts (I mean <= 6 per day) with 75+ followings. And I'm convinced many people have the same issue, possible at different levels.

It's the same across different accounts, as much as I try to trash and rebuild ~/.twister directory, nothing gives :'(

miguelfreitas commented 10 years ago

I need to get down to a problematic case, where my peer connects to a different peer which has the posts i'm missing. Then we may try to understand why i'm not getting them. Here is one idea:

Apply this patch to src/twister.cpp, so only the problematic username will start:

--- a/src/twister.cpp
+++ b/src/twister.cpp
@@ -77,6 +77,9 @@ torrent_handle startTorrentUser(std::string const &username, bool following)
     bool userInTxDb = usernameExists(username); // keep this outside cs_twister to avoid deadlock
     if( !userInTxDb )
         return torrent_handle();
+        
+    if( username != "usernamehere" )
+        return torrent_handle();

     LOCK(cs_twister);
     if( !m_userTorrent.count(username) ) {

Then enable verbose logging with this other patch:

--- a/libtorrent/src/peer_connection.cpp
+++ b/libtorrent/src/peer_connection.cpp
@@ -539,7 +539,7 @@ namespace libtorrent
                char buf[450];
                snprintf(buf, sizeof(buf), "%s: %s\n", time_now_string(), usr);
         //(*m_logger) << buf;
-        //printf(buf);
+        printf(buf);
        }
 #endif

And this configure line:

./configure --enable-debug --enable-logging=verbose make clean ; make

Then your debug.log should be packed with things like this:

00:01:38.345: <== HANDSHAKE
00:01:38.345: ==> BITFIELD [ 1111111111111111110111111111111111111111111111111111111111111111111111100000111100000000000011111111111110000000 ]
00:01:38.345: ==> ALLOWED_FAST [ 11 ]
00:01:38.345: ==> ALLOWED_FAST [ 20 ]
00:01:38.345: ==> ALLOWED_FAST [ 54 ]
00:01:38.345: ==> ALLOWED_FAST [ 60 ]
00:01:38.345: >>> CORKED WRITE [ bytes: 264 ]
00:01:38.345: ==> ALLOWED_FAST [ 30 ]
00:01:38.345: ==> ALLOWED_FAST [ 44 ]
00:01:38.345: ==> ALLOWED_FAST [ 82 ]
00:01:38.345: ==> ALLOWED_FAST [ 77 ]
00:01:38.345: ==> ALLOWED_FAST [ 58 ]
00:01:38.345: ==> ALLOWED_FAST [ 55 ]
00:01:38.345: ==> DHT_PORT [ 29333 ]
00:01:38.345: >>> ASYNC_WRITE [ bytes: 325 ]
00:01:38.345: <== EXTENDED HANDSHAKE: {
'complete_ago': -1,
'ipv6': 'fe800000000000000216cbfffeaefda8',
'm': { 'lt_donthave': 7, 'lt_tex': 19, 'share_mode': 8, 'upload_only': 3, 'ut_holepunch': 4, 'ut_pex': 1 },
'reqq': 250,
'tr': 'da39a3ee5e6b4b0d3255bfef95601890afd80709',
'v': 'libtorrent/1.0.0.0',
'yourip': 'c91dbee3'
}
00:01:38.345: : set_external_address(201.29.190.227, 2, 90.231.138.199)
00:01:38.345: <== BITFIELD [ 1111111111111111110111111111111111111111111111111111111111111111111111100000111100000000000011111111111110000000 ]

The messages that i usually find interesting are the these BITFIELD. As you may see in above messages (this is "toyg" which i'm missing some posts here) both the outgoing (==>) and incoming (<==) seem to be missing the same pieces. These are the zeros. The ones are the posts we have.

So in the test above there is not much to see, as it seems both clients are either have been hit by the same problem or the posts have not propagated outsider the poster's computer. So if we get lucky we might eventually connect to the poster himself, who might have the entire "torrent".

That's why i've said it is a boring debug... if you want to give it a try, test the recipe above. If you find a case where incoming and outgoing BITFIELDs differ (and they still don't synchronize) let me know!!

BlockTester commented 10 years ago

I applied the patch @miguelfreitas suggested but now the postboard is completely empty (not even loading the local cache) and I'm not getting the BITFIELD debug message...

00:01:10.122:  There are no torrents, disconnect
BlockTester commented 10 years ago

Oh man I'm dumb "usernamehere" is not literal :P Let me recompile!

BlockTester commented 10 years ago
2014-01-22 18:48:26 00:01:20.659: <== twister protocollll
2014-01-22 18:48:26 00:01:20.659: <== EXTENSIONS [ 0000000000000000000000000000000000000000000100000000000000000101 ext: DHT FAST extension ]
2014-01-22 18:48:26 00:01:20.659: <<< info_hash received
2014-01-22 18:48:26 00:01:20.659: >>> SEND BUFFER DEPLETED [ quota: 1 ignore: no buf: 0 connecting: no disconnecting: no pending_disk: 0 ]
2014-01-22 18:48:26 00:01:20.659: <<< received peer_id: 2d5457313030302d334c34767a784e434f4c4c45 client: twister 1.0.0
as ascii: -TW1000-3L4vzxNCOLLE

2014-01-22 18:48:26 00:01:20.659: >>> CORKED WRITE [ bytes: 6 ]
2014-01-22 18:48:26 00:01:20.659: >>> CORKED WRITE [ bytes: 234 ]
2014-01-22 18:48:26 00:01:20.659: ==> EXTENDED HANDSHAKE: dictionary
 [complete_ago]   -1
 [ipv6]   fe8000000000000002241dfffe7dfe7a
 [m]
  dictionary
   [lt_donthave]     7
   [lt_tex]     19
   [share_mode]     8
   [upload_only]     3
   [ut_holepunch]     4
   [ut_pex]     1
 [p]   29333
 [reqq]   250
 [tr]   da39a3ee5e6b4b0d3255bfef95601890afd80709
 [v]   libtorrent/1.0.0.0
 [yourip]   d99d899d

2014-01-22 18:48:26 00:01:20.659: <== HANDSHAKE
2014-01-22 18:48:26 00:01:20.659: ==> BITFIELD [ 0000011111111111111111111111111101111000 ]
2014-01-22 18:48:26 00:01:20.659: ==> ALLOWED_FAST [ 20 ]
2014-01-22 18:48:26 00:01:20.660: ==> ALLOWED_FAST [ 16 ]
2014-01-22 18:48:26 00:01:20.660: >>> CORKED WRITE [ bytes: 262 ]
2014-01-22 18:48:26 00:01:20.660: ==> ALLOWED_FAST [ 15 ]
2014-01-22 18:48:26 00:01:20.660: ==> ALLOWED_FAST [ 14 ]
2014-01-22 18:48:26 00:01:20.660: ==> ALLOWED_FAST [ 24 ]
2014-01-22 18:48:26 00:01:20.660: ==> ALLOWED_FAST [ 34 ]
2014-01-22 18:48:26 00:01:20.660: ==> ALLOWED_FAST [ 22 ]
2014-01-22 18:48:26 00:01:20.660: ==> ALLOWED_FAST [ 32 ]
2014-01-22 18:48:26 00:01:20.660: ==> ALLOWED_FAST [ 33 ]
2014-01-22 18:48:26 00:01:20.660: ==> ALLOWED_FAST [ 1 ]
2014-01-22 18:48:26 00:01:20.660: ==> DHT_PORT [ 29333 ]
2014-01-22 18:48:26 00:01:20.660: >>> ASYNC_WRITE [ bytes: 341 ]
2014-01-22 18:48:26 00:01:20.660: *** ON_SEND_DATA [ bytes: 341 error: Success ]
2014-01-22 18:48:26 00:01:20.660: >>> SEND BUFFER DEPLETED [ quota: 1 ignore: no buf: 0 connecting: no disconnecting: no pending_disk: 0 ]
2014-01-22 18:48:26 00:01:20.783: >>> SEND BUFFER DEPLETED [ quota: 1 ignore: no buf: 0 connecting: no disconnecting: no pending_disk: 0 ]
2014-01-22 18:48:26 00:01:20.783: <== EXTENDED HANDSHAKE: {
'complete_ago': -1,
'ipv6': 'fe8000000000000XXXXXXXXXXXXXXXXX',
'm': { 'lt_donthave': 7, 'lt_tex': 19, 'share_mode': 8, 'upload_only': 3, 'ut_holepunch': 4, 'ut_pex': 1 },
'reqq': 250,
'tr': 'da39a3ee5e6b4b0d3255bfef95601890afd80709',
'v': 'libtorrent/1.0.0.0',
'yourip': 'fffffffff'
}
2014-01-22 18:48:26 00:01:20.783: : set_external_address(XXX.XXX.XXX.XXX, 2, XXX.XXX.XXX.XXX)
00:01:20.783: <== BITFIELD [ 0000011111111111111111111111111101111000 ]
2014-01-22 18:48:26 00:01:20.783: <== ALLOWED_FAST [ 29 ]
2014-01-22 18:48:26 00:01:20.783: <== ALLOWED_FAST [ 28 ]
2014-01-22 18:48:26 00:01:20.783: >>> SEND BUFFER DEPLETED [ quota: 1 ignore: no buf: 0 connecting: no disconnecting: no pending_disk: 0 ]
2014-01-22 18:48:26 00:01:20.783: <== ALLOWED_FAST [ 23 ]
2014-01-22 18:48:26 00:01:20.783: <== ALLOWED_FAST [ 27 ]
2014-01-22 18:48:26 00:01:20.783: <== ALLOWED_FAST [ 10 ]
2014-01-22 18:48:26 00:01:20.783: >>> SEND BUFFER DEPLETED [ quota: 1 ignore: no buf: 0 connecting: no disconnecting: no pending_disk: 0 ]
2014-01-22 18:48:26 00:01:20.783: <== ALLOWED_FAST [ 21 ]
2014-01-22 18:48:26 00:01:20.783: <== ALLOWED_FAST [ 22 ]
2014-01-22 18:48:26 00:01:20.783: <== ALLOWED_FAST [ 9 ]
2014-01-22 18:48:26 00:01:20.783: <== ALLOWED_FAST [ 34 ]
2014-01-22 18:48:26 00:01:20.783: >>> SEND BUFFER DEPLETED [ quota: 1 ignore: no buf: 0 connecting: no disconnecting: no pending_disk: 0 ]
2014-01-22 18:48:26 00:01:20.783: <== ALLOWED_FAST [ 36 ]
2014-01-22 18:48:26 00:01:20.783: <== DHT_PORT [ p: 29333 ]

I also get a lot of those

2014-01-22 18:48:26 00:01:21.243: >>> COMPLETED [ ep: XXX.XXX.XXX.XXX:29333 rtt: 108 ]
2014-01-22 18:48:26 00:01:21.243: *** SET NON-BLOCKING
2014-01-22 18:48:26 00:01:21.243: >>> SET_TOS[ tos: 0 e: Success ]
2014-01-22 18:48:26 00:01:21.243: *** outgoing encryption policy: enabled
2014-01-22 18:48:26 00:01:21.243: ==> EXTENSION [ 0000000000000000000000000000000000000000000100000000000000000101 ]
2014-01-22 18:48:26 00:01:21.243: ==> HANDSHAKE [ ih: 2c2e02862a08a0dae432e42d2dcaa5b803f01d09 ]
2014-01-22 18:48:26 00:01:21.243: >>> ASYNC_WRITE [ bytes: 68 ]
2014-01-22 18:48:26 00:01:21.243: *** ON_SEND_DATA [ bytes: 0 error: Broken pipe ]
2014-01-22 18:48:26 00:01:21.243: **ERROR**: Broken pipe [in peer_connection::on_send_data]
2014-01-22 18:48:26 00:01:21.243: *** CONNECTION CLOSED Broken pipe
2014-01-22 18:48:26 00:01:21.243: *** ERROR [ in peer_connection::on_receive_data error: Transport endpoint is not connected ]
2014-01-22 18:48:26 00:01:21.243: *** CONNECTION CLOSED
2014-01-22 18:48:27 00:01:21.293: >>> COMPLETED [ ep: XXX.XXX.XXX.XXX:29333 rtt: 158 ]
2014-01-22 18:48:27 00:01:21.293: *** SET NON-BLOCKING
2014-01-22 18:48:27 00:01:21.293: >>> SET_TOS[ tos: 0 e: Success ]
2014-01-22 18:48:27 00:01:21.293: *** outgoing encryption policy: enabled
2014-01-22 18:48:27 00:01:21.293: ==> EXTENSION [ 0000000000000000000000000000000000000000000100000000000000000101 ]
2014-01-22 18:48:27 00:01:21.293: ==> HANDSHAKE [ ih: 96a6aa7c4d67104e87a2065632d4f301dd9f8462 ]
2014-01-22 18:48:27 00:01:21.293: >>> ASYNC_WRITE [ bytes: 68 ]
2014-01-22 18:48:27 00:01:21.293: *** ON_SEND_DATA [ bytes: 0 error: Broken pipe ]
2014-01-22 18:48:27 00:01:21.293: **ERROR**: Broken pipe [in peer_connection::on_send_data]
2014-01-22 18:48:27 00:01:21.293: *** CONNECTION CLOSED Broken pipe
2014-01-22 18:48:27 00:01:21.293: *** ERROR [ in peer_connection::on_receive_data error: Transport endpoint is not connected ]
2014-01-22 18:48:27 00:01:21.293: *** CONNECTION CLOSED
2014-01-22 18:48:27 00:01:21.604: >>> SEND BUFFER DEPLETED [ quota: 33 ignore: no buf: 0 connecting: no disconnecting: no pending_disk: 0 ]
2014-01-22 18:48:27 00:01:21.736: <== EXTENSION MESSAGE [ msg: 1 size: 84 ]
2014-01-22 18:48:27 00:01:21.736: <== PEX [ dropped: 0 added: 2 ]
2014-01-22 18:48:27 00:01:21.736: >>> SEND BUFFER DEPLETED [ quota: 33 ignore: no buf: 0 connecting: no disconnecting: no pending_disk: 0 ]
2014-01-22 18:48:27 00:01:22.132: >>> COMPLETED [ ep: XXX.XXX.XXX.XXX:29333 rtt: 2998 ]
2014-01-22 18:48:27 00:01:22.132: *** SET NON-BLOCKING
2014-01-22 18:48:27 00:01:22.132: >>> SET_TOS[ tos: 0 e: Success ]
2014-01-22 18:48:27 00:01:22.132: *** outgoing encryption policy: enabled
2014-01-22 18:48:27 00:01:22.132: *** initiating encrypted handshake
2014-01-22 18:48:27 00:01:22.133:  pad size: 196

Specifically CONNECTION CLOSED Broken pipe which seems to be reflected in the graphs I had posted earlier.

miguelfreitas commented 10 years ago

==> BITFIELD [ 0000011111111111111111111111111101111000 ] <== BITFIELD [ 0000011111111111111111111111111101111000 ]

So here both peers have exactly the same pieces, they have nothing to exchange. We need to find when they differ. I did found those in the past (last year), and i've fixed bugs. but there is probably some bug left, otherwise we would be all sync'ing fine...

bortzmeyer commented 10 years ago

I have the same problem (only my posts and posts to me in the home) and it is quite annoying. For instance:

% ./twisterd getposts 1 '[{"username":"myleneb"}]' [ ]

(And nothing in the Web interface, although I follow her.) But she posts messages, I can see at least one in the mentions.

I applied the trick in #128 to catch up with the blockchain. I just note it here in case there is a relationship.

centaur1 commented 6 years ago

I am not sure, but is this still an issue? I am wondering whether it is. I haven't been able to determine for sure, but am monitoring it. Does anyone know? Superficially it seems to be similar to the crashing error (#429 ) where not everything is ready.