tdlib / td

Cross-platform library for building Telegram clients
https://core.telegram.org/tdlib
Boost Software License 1.0
7.15k stars 1.45k forks source link

Porting for PS VITA ongoing issues #1695

Closed v-atamanenko closed 3 years ago

v-atamanenko commented 3 years ago

Hello. I've made some good progress in porting tdlib to PS Vita since #1558 Tdlib compiles successfully, and (partially works), up to database initialization.

That's where I encounter this error:

[ 1][t 0][1633380144.407558917][TdDb.cpp:432][!Td]      Destroy bad SQLite database because of [Error : 0 : Can't open database: [query:SELECT count(*) FROM sqlite_master] to database "ux0:data/vitagram/db.sqlite" failed: disk I/O error]

My question is: how to localize the issue? I.e., which exactly sqlite function is the culprit? I tried enabling -DSQLITE_ENABLE_SQLLOG, but sqlite3_init_sqllog is not implemented.

Previously I had similar issue:

[ 1][t 0][1633345070.925013065][TdDb.cpp:432][!Td]      Destroy bad SQLite database because of [Error : 0 : Failed to open database: [Error : 0 : unable to open database file for database "ux0:data/vitagram/db.sqlite"]]

But was able to fix it because of more rarely occurring error message that could be found in the code. Whereas with this one, I fail to find the problematic line of code.

Current code I use: https://github.com/v-atamanenko/td. Full execution log (verbosity level 25) up to the error: https://gist.github.com/v-atamanenko/31d54bf13d105cc4061968500a2074e1

Thank you in advance for your help.

levlam commented 3 years ago

You can begin with using TDLib without SQLite with all databases disabled in setTdlibParameters.

v-atamanenko commented 3 years ago

@levlam thank you, this helped me get pass the sqlite initialization errors, but now tdlib hangs after creating auth_sendCode query, and no code gets actually sent. Where should I look for the issue? Log: https://gist.github.com/v-atamanenko/92b75f6a2a3f7c34dd1cadc54eb0fa7f

levlam commented 3 years ago

It is hard to debug TDLib as the whole. First, you need to make sure that all tests pass. Use run_all_tests executable for that. It has option --filter, which can be used to filter out some tests or test only specific tests.

v-atamanenko commented 3 years ago

Ok then I'll need to port the tests first. Will reopen the issue when I get any results. Thank you.

v-atamanenko commented 3 years ago

Hi @levlam . So I managed to localize the issue while performing tests. Execution hangs forever at this line https://github.com/tdlib/td/blob/a160241db7e8c78fc7866ca4b9285c943f1cae2f/tdutils/td/utils/port/SocketFd.cpp#L493 while performing Mtproto_ping. All previous tests (except for sqlite ones, disabled em) pass successfully.

Would really appreciate any suggestions. Thank you.

levlam commented 3 years ago

This means that read from the socket never returns actual data or returns EINTR instead al the time. This can be a bug in the Posix compatibility layer with implementation of asynchronous socket IO. For example, read blocks instead of returning empty response.

v-atamanenko commented 3 years ago

You're right. I fixed this by providing custom implementation of socket flags changing in NativeFd::set_is_blocking instead of fcntl(). Thank you for the hint.

Everything would be great if not for this. Now Test_Mtproto_config is failing:

[ 0][t 0][1634131974.893456935][SocketFd.cpp:507][!Connect]     Check `result <= slice.size()` failed

While running, result and slice.size change like this:

result size: 512, slice size: 512
result size: 2249, slice size: 4096
result size: 11, slice size: 1847
result size: 11, slice size: 1836
result size: 11, slice size: 1825
result size: 11, slice size: 1814
result size: 11, slice size: 1803
result size: 11, slice size: 1792
result size: 11, slice size: 1781
result size: 11, slice size: 1770
result size: 11, slice size: 1759
result size: 11, slice size: 1748
result size: 11, slice size: 1737
result size: 11, slice size: 1726
result size: 11, slice size: 1715
result size: 11, slice size: 1704
result size: 11, slice size: 1693
result size: 11, slice size: 1682
result size: 11, slice size: 1671
result size: 11, slice size: 1660
result size: 11, slice size: 1649
result size: 11, slice size: 1638
result size: 11, slice size: 1627
result size: 11, slice size: 1616
result size: 11, slice size: 1605
result size: 11, slice size: 1594
result size: 11, slice size: 1583
result size: 11, slice size: 1572
result size: 11, slice size: 1561
result size: 11, slice size: 1550
result size: 11, slice size: 1539
result size: 11, slice size: 1528
result size: 11, slice size: 1517
result size: 11, slice size: 1506
result size: 11, slice size: 1495
result size: 11, slice size: 1484
result size: 11, slice size: 1473
result size: 11, slice size: 1462
result size: 11, slice size: 1451
result size: 11, slice size: 1440
result size: 11, slice size: 1429
result size: 11, slice size: 1418
result size: 11, slice size: 1407
result size: 11, slice size: 1396
result size: 11, slice size: 1385
result size: 11, slice size: 1374
result size: 11, slice size: 1363
result size: 11, slice size: 1352
result size: 11, slice size: 1341
result size: 11, slice size: 1330
result size: 11, slice size: 1319
result size: 11, slice size: 1308
result size: 11, slice size: 1297
result size: 11, slice size: 1286
result size: 11, slice size: 1275
result size: 11, slice size: 1264
result size: 11, slice size: 1253
result size: 11, slice size: 1242
result size: 11, slice size: 1231
result size: 11, slice size: 1220
result size: 11, slice size: 1209
result size: 11, slice size: 1198
result size: 11, slice size: 1187
result size: 11, slice size: 1176
result size: 11, slice size: 1165
result size: 11, slice size: 1154
result size: 11, slice size: 1143
result size: 11, slice size: 1132
result size: 11, slice size: 1121
result size: 11, slice size: 1110
result size: 11, slice size: 1099
result size: 11, slice size: 1088
result size: 11, slice size: 1077
result size: 11, slice size: 1066
result size: 11, slice size: 1055
result size: 11, slice size: 1044
result size: 11, slice size: 1033
result size: 11, slice size: 1022
result size: 11, slice size: 1011
result size: 11, slice size: 1000
result size: 11, slice size: 989
result size: 11, slice size: 978
result size: 11, slice size: 967
result size: 11, slice size: 956
result size: 11, slice size: 945
result size: 11, slice size: 934
result size: 11, slice size: 923
result size: 11, slice size: 912
result size: 11, slice size: 901
result size: 11, slice size: 890
result size: 11, slice size: 879
result size: 11, slice size: 868
result size: 11, slice size: 857
result size: 11, slice size: 846
result size: 11, slice size: 835
result size: 11, slice size: 824
result size: 11, slice size: 813
result size: 11, slice size: 802
result size: 11, slice size: 791
result size: 11, slice size: 780
result size: 11, slice size: 769
result size: 11, slice size: 758
result size: 11, slice size: 747
result size: 11, slice size: 736
result size: 11, slice size: 725
result size: 11, slice size: 714
result size: 11, slice size: 703
result size: 11, slice size: 692
result size: 11, slice size: 681
result size: 11, slice size: 670
result size: 11, slice size: 659
result size: 11, slice size: 648
result size: 11, slice size: 637
result size: 11, slice size: 626
result size: 11, slice size: 615
result size: 11, slice size: 604
result size: 11, slice size: 593
result size: 11, slice size: 582
result size: 11, slice size: 571
result size: 11, slice size: 560
result size: 11, slice size: 549
result size: 11, slice size: 538
result size: 11, slice size: 527
result size: 11, slice size: 516
result size: 11, slice size: 505
result size: 11, slice size: 494
result size: 11, slice size: 483
result size: 11, slice size: 472
result size: 11, slice size: 461
result size: 11, slice size: 450
result size: 11, slice size: 439
result size: 11, slice size: 428
result size: 11, slice size: 417
result size: 11, slice size: 406
result size: 11, slice size: 395
result size: 11, slice size: 384
result size: 11, slice size: 373
result size: 11, slice size: 362
result size: 11, slice size: 351
result size: 11, slice size: 340
result size: 11, slice size: 329
result size: 11, slice size: 318
result size: 11, slice size: 307
result size: 11, slice size: 296
result size: 11, slice size: 285
result size: 11, slice size: 274
result size: 11, slice size: 263
result size: 11, slice size: 252
result size: 11, slice size: 241
result size: 11, slice size: 230
result size: 11, slice size: 219
result size: 11, slice size: 208
result size: 11, slice size: 197
result size: 11, slice size: 186
result size: 11, slice size: 175
result size: 11, slice size: 164
result size: 11, slice size: 153
result size: 11, slice size: 142
result size: 11, slice size: 131
result size: 11, slice size: 120
result size: 11, slice size: 109
result size: 11, slice size: 98
result size: 11, slice size: 87
result size: 11, slice size: 76
result size: 11, slice size: 65
result size: 11, slice size: 54
result size: 11, slice size: 43
result size: 11, slice size: 32
result size: 11, slice size: 21
result size: 11, slice size: 10

Do you know what may cause this 1 byte difference? Or is it safe to drop the check? (as I see, initially it wasn't there)

levlam commented 3 years ago

This check must not be dropped.

It looks like response from ::read is incorrect. I doubt that there were exactly 11 bytes read for 168 consecutive calls to ::read.

v-atamanenko commented 3 years ago

Fixed that, thank you! I hope you're not too tired with my silly questions. Now I'm at the point of Test_Client_Simple. It seems to run well until the very last moment where it causes a core dump crash. Any ideas here?

[ 1][t 0][1634863442.920485258][tests.cpp:221]  Run test [name:Test_Client_Simple]
[ 3][t 0][1634863442.923151969][Td.cpp:2717]    Create Td with layer 133, database version 13 and version 33
[ 3][t 0][1634863442.926289081][Td.cpp:4229][!Td][&td_requests] Sending update: updateOption {
  name = "version"
  value = optionValueString {
    value = "1.7.8"
  }
}
[ 3][t 0][1634863442.930514097][Td.cpp:4229][!Td][&td_requests] Sending update: updateAuthorizationState {
  authorization_state = authorizationStateWaitTdlibParameters {
  }
}
[ 3][t 0][1634863442.934684991][Td.cpp:3078][!Td][&td_requests] Receive request 3: testSquareInt {
  x = 3
}
[ 3][t 0][1634863442.939100027][Td.cpp:4244][!Td][&td_requests] Sending result for request 3: testInt {
  value = 9
}
[ 3][t 0][1634863442.942342996][Td.cpp:3443][!Td]       Receive Td::hangup
[ 2][t 0][1634863442.944584131][Td.cpp:3721][!Td]       Close Td in state 0
[ 3][t 0][1634863442.946958065][Td.cpp:4229][!Td][&td_requests] Sending update: updateAuthorizationState {
  authorization_state = authorizationStateClosing {
  }
}
[ 4][t 0][1634863442.951019048][Td.cpp:3562][!Td]       Decrease request actor count to 0
[ 2][t 0][1634863442.953149080][Td.cpp:3564][!Td]       Have no request actors
[ 4][t 0][1634863442.955481290][Td.cpp:3460][!Td]       Decrease reference count to 0
[ 2][t 0][1634863442.957711219][Td.cpp:3540][!Td]       ON_CLOSED
[ 3][t 0][1634863442.959849119][Td.cpp:4229][!Td][&td_requests] Sending update: updateAuthorizationState {
  authorization_state = authorizationStateClosed {
  }
}
[ 2][t 0][1634863442.963973283][Td.cpp:3550][!Td]       Stop Td
3:0x9cd098e26c7c132a:0x8c2ff5fd:[coredump] start ID=TDLIB0TST
3:0x9cd098e22f72a2e7:0x8c2ff5fd:[coredump] error:0x80010005
levlam commented 3 years ago

This is a too complex test. Skip it for a while to return later to it.

v-atamanenko commented 3 years ago

I have found the source of the crash and how to fix it @levlam . It is caused by a second call to concurrentscheduler->finish(); in Client.cpp. Here I made a simple demonstration by adding debug prints to Client.cpp (see gist) and the produced log: (note: "AA" and "AB" are wrapping CHECK(state_ == State::Run) in ConcurrentScheduler::finish()):

[ 1][t 0][1635118774.682305097][tests.cpp:221]  Run test [name:Test_Client_Simple]
[ 3][t 0][1635118774.685214996][Td.cpp:2716]    Create Td with layer 133, database version 13 and version 33
[ 3][t 0][1635118774.688490152][Td.cpp:4228][!Td][&td_requests] Sending update: updateOption {
  name = "version"
  value = optionValueString {
    value = "1.7.8"
  }
}
[ 3][t 0][1635118774.692764043][Td.cpp:4228][!Td][&td_requests] Sending update: updateAuthorizationState {
  authorization_state = authorizationStateWaitTdlibParameters {
  }
}
[ 3][t 0][1635118774.696933269][Td.cpp:3077][!Td][&td_requests] Receive request 3: testSquareInt {
  x = 3
}
[ 3][t 0][1635118774.701218128][Td.cpp:4243][!Td][&td_requests] Sending result for request 3: testInt {
  value = 9
}
HERE55
HERE77
[ 3][t 0][1635118774.705452203][Td.cpp:3442][!Td]       Receive Td::hangup
[ 2][t 0][1635118774.707806110][Td.cpp:3720][!Td]       Close Td in state 0
[ 3][t 0][1635118774.710268020][Td.cpp:4228][!Td][&td_requests] Sending update: updateAuthorizationState {
  authorization_state = authorizationStateClosing {
  }
}
HERE88
HERE99
HERE97
HERE99
[ 4][t 0][1635118774.715926170][Td.cpp:3561][!Td]       Decrease request actor count to 0
[ 2][t 0][1635118774.718291044][Td.cpp:3563][!Td]       Have no request actors
[ 4][t 0][1635118774.720797061][Td.cpp:3459][!Td]       Decrease reference count to 0
[ 2][t 0][1635118774.723045110][Td.cpp:3539][!Td]       ON_CLOSED
[ 3][t 0][1635118774.725254058][Td.cpp:4228][!Td][&td_requests] Sending update: updateAuthorizationState {
  authorization_state = authorizationStateClosed {
  }
}
[ 2][t 0][1635118774.729550123][Td.cpp:3549][!Td]       Stop Td
HERE97
HERE99
HERE1
AA
AB
AC
AD
AE
AH
AI
AJ
AK
AL
AM
AN
HERE97
HERE2
AA
3:0x9cd098e26c7c132a:0x8c2ff5fd:[coredump] start ID=TDLIB0TST
3:0x9cd098e2e133745a:0x8c2ff5fd:[coredump] done

It is fixed very simply by adding

if (concurrent_scheduler_ == nullptr) {
    return;
}

after line 190 of Client.cpp.

I think the same applies to MultiImpl.

It's kind of weird that I am the only one experiencing this, what do you think?

levlam commented 3 years ago

Thanks.

It's kind of weird that I am the only one experiencing this, what do you think?

You are the only to build TDLib without EventFd (with TD_EVENTFD_UNSUPPORTED), but not for Emscripten and with threads supported. I have just pushed some fixes for this setup and now all tests work me on Linux with TD_EVENTFD_UNSUPPORTED.

v-atamanenko commented 3 years ago

Thank you so much. It's good now. Another question: I experienced a forever hang with Mtproto_handshake. A lot of debugging led me to the culprit. In BigNum::is_prime (tdutils/td/utils/BigNum.cpp) a call to BN_is_prime_ex is never returning.

I have two guesses here: 1) Could this be a bug in the local OpenSSL library? If so, is there an alternative for this fuction that could be used? 2) Could this be a bug in the call itself, since I guess most of the time users have newer OpenSSL and BN_check_prime used instead?

levlam commented 3 years ago

No, this can't be a bug in the call or OpenSSL library. But if the device/emulator is very slow, the call can be very slow also. You can try to reduce number of iterations to 1 instead of 128/64 to test this. It is unsafe to reduce the number of iterations for real usages, so you will need to return back 128/64, but this will not affect TDLib's speed. This check isn't run at all in common workflows.

v-atamanenko commented 3 years ago

You are right. It's just very slow. Changing to 1 iteration does work, in sense that there is no "seemingly forever hang" anymore. But even thus, to not stumble into "Timeout Expired" and pass the test successfully, I had to disable is_prime() checks in DhHandshake::check_config, since even with 1 iteration it was too slow to fit into the timeout.

What are the security implications? What if I will find that this check is what's disabling me to use tdlib in real conditions?

v-atamanenko commented 3 years ago

So now all tests pass, and I decided to try running the client again. The result is exactly the same as when we started:

@levlam thank you, this helped me get pass the sqlite initialization errors, but now tdlib hangs after creating auth_sendCode query, and no code gets actually sent. Where should I look for the issue? Log: https://gist.github.com/v-atamanenko/92b75f6a2a3f7c34dd1cadc54eb0fa7f

Here is the latest full log with verbosity level 100.

Looking at the log... Could the reason be in PollableFd? As in, poll() is not available in Vita's newlib yet.

levlam commented 3 years ago

What are the security implications? What if I will find that this check is what's disabling me to use tdlib in real conditions?

In practise, this check isn't used at all, because TDLib has some built-in values, for which this check was done beforehand.

levlam commented 3 years ago

Looking at the log... Could the reason be in PollableFd? As in, poll() is not available in Vita's newlib yet.

Anything is possible, but errors aren't expected when tests work.

You disabled EventFd, so TDLib works in a single-thread mode and only whenever receive/execute are called. You need to call them after amount of time received from td::ConcurrentScheduler::emscripten_get_main_timeout() has passed, or whenever there is any incoming network activity. Alternatively, you need to call receive often enough. Also, without EventFd TDLib isn't thread-safe, so its methods can be called only from a dedicated thread.

Between, logs can contain a lot of private details. You can send them to @tdlib_bot in Telegram instead of pasting them in a public gist.

v-atamanenko commented 3 years ago

@levlam great, great thanks for all your help on these issues! It works!!!

levlam commented 3 years ago

You did a great job. Congratulations!