Open kenz-gelsoft opened 1 year ago
-2147483643
is 0x80000005
, which translates to B_BAD_VALUE
.
This occurs in a hserver
call:
status_t status = loader_hserver_call_register_image(&haiku_info, sizeof(haiku_info));
if (status < 0)
{
std::cerr << "loader_load_runtime: failed to register runtime_loader image: " << status << std::endl;
return false;
}
It should be somewhere here:
intptr_t server_hserver_call_register_image(hserver_context& context, void* image_info, std::size_t size)
{
if (size != sizeof(haiku_extended_image_info))
{
return B_BAD_VALUE;
}
Seems like image_info
has been updated upstream since the last time I've touched hyclone
.
However, a quick browse of the source does not indicate so.
Can you somehow debug hyclone_server
(using gdb
running as root) and get the size
parameter? Alternatively, you can insert a std::cerr
statement there and then tail -f $HPREFIX/.hyclone.log
.
Thank you for taking this. I added logging:
[kenz@archlinux hyclone]$ git diff
diff --git a/hyclone_server/image.cpp b/hyclone_server/image.cpp
index 720ca27..4af066f 100644
--- a/hyclone_server/image.cpp
+++ b/hyclone_server/image.cpp
@@ -13,6 +13,7 @@ intptr_t server_hserver_call_register_image(hserver_context& context, void* imag
{
if (size != sizeof(haiku_extended_image_info))
{
+ std::cerr << "server_hserver_call_register_image: size=" << size << std::endl;
return B_BAD_VALUE;
}
but log doesn't contain added logging.
[kenz@archlinux hyclone]$ grep size $HPREFIX/.hyclone.log
[kenz@archlinux hyclone]$
so I guess context.process->RegisterImage()
returns B_BAD_VALUE.
By moving logging to outside of if clause
[kenz@archlinux hyclone]$ git diff
diff --git a/hyclone_server/image.cpp b/hyclone_server/image.cpp
index 720ca27..d554327 100644
--- a/hyclone_server/image.cpp
+++ b/hyclone_server/image.cpp
@@ -11,6 +11,7 @@
intptr_t server_hserver_call_register_image(hserver_context& context, void* image_info, std::size_t size)
{
+ std::cerr << "server_hserver_call_register_image: size=" << size << std::endl;
if (size != sizeof(haiku_extended_image_info))
{
return B_BAD_VALUE;
it logs size=1136 multiple times, so above logging should works correctly.
[kenz@archlinux hyclone]$ grep size $HPREFIX/.hyclone.log | uniq
server_hserver_call_register_image: size=1136
[kenz@archlinux hyclone]$
I will try debugging maybe this weekend.
Under normal circumstances this should not happen:
int Process::RegisterImage(const haiku_extended_image_info& image)
{
int id = _images.Add(image);
_images.Get(id).basic_info.id = id;
_info.image_count = _images.size();
return id;
}
By moving logging to outside of if clause
Maybe logging sizeof(haiku_extended_image_info)
might help. If it's not 1136
, you might want to try to get the real struct extended_image_info
on Haiku and see which HyClone component(s) is getting the size wrong.
Under normal circumstances this should not happen:
Indeed.
Maybe logging sizeof(haiku_extended_image_info) might help. If it's not 1136, you might want to try to get the real struct extended_image_info on Haiku and see which HyClone component(s) is getting the size wrong.
It didn't help. logging sizeof(haiku_extended_image_info)
is always 1136.
I will try debugging maybe this weekend.
So I tried debugging, but debug build works correctly... Some optimization may be the cause.
ccmake ..
.Which compiler and optimization flags are you using?
Normally I do test release builds, but there may have been some undefined behavior somewhere that only arises in certain modes of certain compilers.
gcc version
c++ --version
c++ (GCC) 13.2.1 20230801
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I don't specify any special optimization flags (arch linux g++ default)
I'm testing some CMake optimization options
Will try clang++ tonight. And maybe RelWithDebugInfo could be used to debug.
Also, it might be the case that the servercall hasn't even reached the function yet - B_BAD_VALUE
is returned from the dispatcher:
Considering what you said in https://github.com/trungnt2910/hyclone/issues/15#issuecomment-1787468948 and https://github.com/trungnt2910/hyclone/issues/15#issuecomment-1787485874, this might be the problem. For calls that pass through the dispatcher, server_hserver_call_register_image
always succeeds, but for those that don't, the function isn't called at all.
Before any further debugging I suggest adding logs to the if (!context.process)
blocks in server_dispatch.cpp
.
Before any further debugging I suggest adding logs to the
if (!context.process)
blocks inserver_dispatch.cpp
.
if (!context.process)
block logged returning B_BAD_VALUE
.
diff --git a/hyclone_server/server_dispatch.cpp b/hyclone_server/server_dispatch.cp
p
index 1e534b8..c17c456 100644
--- a/hyclone_server/server_dispatch.cpp
+++ b/hyclone_server/server_dispatch.cpp
@@ -1,4 +1,5 @@
#include <cstddef>
+#include <iostream>
#include "haiku_errors.h"
#include "process.h"
#include "servercalls.h"
@@ -39,6 +40,7 @@ intptr_t server_dispatch(intptr_t conn_id, intptr_t call_id,
context.process = system.GetProcess(context.pid).lock();
if (!context.process)
{
+ std::cerr << "!context.process" << std::endl;
return B_BAD_VALUE;
}
}
@@ -51,6 +53,7 @@ intptr_t server_dispatch(intptr_t conn_id, intptr_t call_id,
if (!context.thread)
{
+ std::cerr << "!context.thread" << std::endl;
return B_BAD_VALUE;
}
}
[kenz@archlinux ~]$ grep \! $HPREFIX/.hyclone.log
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
3 5 6 18 9 23 19 12 13 24 15 16 17 14 8 20 !context.process21 22
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
!context.process
I added pid/tid to log, it tells pid=-1
. I should have uploaded entire log.
(This time, it doesn't stop before prompt w/ RelWithDebInfo build.)
While debugging I saw connect() to server socket fails in ServerConnection::Connect(false)
* thread #1, name = 'haiku_loader', stop reason = step over
* frame #0: 0x00005555555725c0 haiku_loader`ServerConnection::Connect(this=0x00007ffff7e8f98c, forceReconnect=<unavailable>) at loader_servercalls.cpp:113:10
frame #1: 0x000055555555d8a5 haiku_loader`main at main.cpp:335:30
frame #2: 0x00007ffff7a45cd0 libc.so.6`___lldb_unnamed_symbol3187 + 128
frame #3: 0x00007ffff7a45d8a libc.so.6`__libc_start_main + 138
frame #4: 0x000055555555e845 haiku_loader`_start + 37
Though RelWithDebInfo reports inlined and optimized callstack, this is call from:
.hyclone.log is:
My laptop encountering this is very poor spec.
I guess hyclone_loader
forks hyclone_server
and hyclone_loader
assumes hyclone_server
is ready to listen when waitpid()
succeeds, but hyclone_server
didn't finish setting up the socket at that time.
Inserting 5sec sleep to above code,
diff --git a/haiku_loader/sys/linux/loader_servercalls.cpp b/haiku_loader/sys/linux/loader_servercalls.cpp
index 9ff8e5c..30c55e5 100644
--- a/haiku_loader/sys/linux/loader_servercalls.cpp
+++ b/haiku_loader/sys/linux/loader_servercalls.cpp
@@ -288,6 +288,9 @@ bool loader_init_servercalls()
return false;
}
+ // Waiting 5secs for server setup.
+ sleep(5);
+
// Here, the server should have daemonized and is listening to the socket.
if (!gServerConnection.Connect())
but it doesn't workaround stop before prompt (it never prompt me).
Keeping installed package: bash-5.2.015-1-x86_64.hpkg
Keeping installed package: graphite2-1.3.14-2-x86_64.hpkg
Keeping installed package: tar-1.35-2-x86_64.hpkg
Keeping installed package: realtek_wifi_firmwares-2019_01_02-1-x86_64.hpkg
Keeping installed package: unzip-6.10c23-5-x86_64.hpkg
Keeping installed package: fontconfig-2.13.96-2-x86_64.hpkg
Done extracting packagefs at "/boot/system"
Done extracting packagefs at "/boot/home/config"
HyClone server listening on /home/kenz/.hprefix
Starting launch_daemon
Welcome to the Haiku shell.
.hyclone.log is:
I added pid/tid to log, it tells
pid=-1
. I should have uploaded entire log.(This time, it doesn't stop before prompt w/ RelWithDebInfo build.)
How did you log the pid/tid? Using context.pid
?
Yes I log the context.pid
diff --git a/hyclone_server/server_dispatch.cpp b/hyclone_server/server_dispatch.cpp
index 1e534b8..2b12683 100644
--- a/hyclone_server/server_dispatch.cpp
+++ b/hyclone_server/server_dispatch.cpp
@@ -1,4 +1,5 @@
#include <cstddef>
+#include <iostream>
#include "haiku_errors.h"
#include "process.h"
#include "servercalls.h"
@@ -39,6 +40,7 @@ intptr_t server_dispatch(intptr_t conn_id, intptr_t call_id,
context.process = system.GetProcess(context.pid).lock();
if (!context.process)
{
+ std::cerr << "!context.process: pid=" << context.pid << std::endl;
return B_BAD_VALUE;
}
}
@@ -51,6 +53,7 @@ intptr_t server_dispatch(intptr_t conn_id, intptr_t call_id,
if (!context.thread)
{
+ std::cerr << "!context.thread tid=" << context.tid << std::endl;
return B_BAD_VALUE;
}
}
If context.pid
is -1
, then this means system.GetThreadFromConnection
has returned -1
for the pid
value.
Can you check whether the connection ID does not exist (in which case the line under if (it == _connections.end())
should be reached), or whether the PID is improperly recorded beforehand?
I added log
diff --git a/hyclone_server/system.cpp b/hyclone_server/system.cpp
index bc00e75..a6558f2 100644
--- a/hyclone_server/system.cpp
+++ b/hyclone_server/system.cpp
@@ -117,8 +117,10 @@ const Connection& System::RegisterConnection(intptr_t conn_id, const Connection&
Connection System::GetThreadFromConnection(intptr_t conn_id)
{
auto it = _connections.find(conn_id);
- if (it == _connections.end())
+ if (it == _connections.end()) {
+ std::cerr << "thread not found for conn_id=" << conn_id << std::endl;
return Connection(-1, -1);
+ }
return it->second;
}
conn ID=9 which not found later is started here in the log.
Connection started: 9 3861 3861
Process 3861 already registered
but its pid is already resitered, not to be registered.
Closing: 9
this comes from server_main.cpp that seems that connection closed by POLL with no available bytes or POLL errors
else if (revents & POLLIN)
{
int inputBytes = 0;
if (ioctl(fd, FIONREAD, &inputBytes) == -1)
{
continue;
}
// POLLIN set but no bytes available. This
// probably means that the client has closed
// the connection.
if (inputBytes == 0)
{
goto connection_close;
}
// (snip)
}
else if (revents & POLLHUP || revents & POLLERR)
{
connection_close:
std::cerr << "Closing: " << fd << std::endl;
close(fd);
3 5 6 8
Keeping a reference to process 3861 because it is executing exec.
Unlocking exec for process 3861
Unregistered: 9 3861 3861
accepted connection
Connection started: 9 3861 3861
Process 3861 already registered
accepted connection
Connection started: 10 3013 4001
Process 3013 already registered
[3013] [3013] _kern_debug_output: stub: _kern_get_next_disk_device_id
accepted connection
Connection started: 11 3013 4045
Process 3013 already registered
accepted connection
accepted connection
Connection started: 13 4360 4360
Registered process: 13 4360Connection started:
12 4377 4377
Registered process: 12 4377
accepted connection
Connection started: 14 3013 4805
Process 3013 already registered
accepted connection
Connection started: 15 4971 4971
Registered process: 15 4971
accepted connection
Connection started: 16 5028 5028
Registered process: 16 5028
accepted connection
Connection started: 17 5288 5288
Registered process: 17 5288
accepted connection
Connection started: 18 5932 5932
Registered process: 18 5932
accepted connection
Connection started: 19 6053 6053
Registered process: 19 6053
[4377] [4377] _kern_debug_output: debug_server: Failed to open console: No such file or directory
Unregistered: 9 3861 3861
- new conn_id 9 connection disconnected here.
MessagingService::SendMessage(): Failed to allocate space for send message command.
MessagingService::SendMessage(): Failed to allocate space for send message command.
Unregistered: 14 3013 4805 Closing: 9
- closed conn_id=9
3 5 6 8 19 10 11 12 13 14 15 16 17 18 Closing: 14 3 5 6 8 19 10 11 12 13 18 15 16 17
- so here conn_id=9 not found.
thread not found for conn_id=9 !context.process: pid=-1 thread not found for conn_id=14 !context.process: pid=-1 Unregistered: 11 3013 4045 Closing: 11 3 5 6 8 19 10 17 12 13 18 15 16 thread not found for conn_id=11 !context.process: pid=-1
---
whole log of this session:
- [_hyclone.log](https://github.com/trungnt2910/hyclone/files/13251923/_hyclone.log)
Hmm, so the thread with connection ID 9 is trying to do some servercalls even after appearing dead.
Can you log all the details about the servercall IDs and arguments of conn_id=9
?
I'll do a closer review after my exam.
I don't think the conn_id=9
is the connection should be inspected on every debugging session. (I think this is a timing issue which threads concern.)
So I logged all call_id, args of server_dispatch()
. Is this OK?
a/hyclone_server/server_dispatch.cpp b/hyclone_server/server_dispatch.cpp
index 1e534b8..053d0df 100644
--- a/hyclone_server/server_dispatch.cpp
+++ b/hyclone_server/server_dispatch.cpp
@@ -1,4 +1,5 @@
#include <cstddef>
+#include <iostream>
#include "haiku_errors.h"
#include "process.h"
#include "servercalls.h"
@@ -8,9 +9,55 @@
thread_local hserver_context* gCurrentContext;
+#define stringify(a) #a
+const char* stringify_call_id(intptr_t call_id)
+{
+#define HYCLONE_SERVERCALL0(name) \
+ case SERVERCALL_ID_##name: \
+ return stringify(name);
+#define HYCLONE_SERVERCALL1(name, arg1) \
+ case SERVERCALL_ID_##name: \
+ return stringify(name);
+#define HYCLONE_SERVERCALL2(name, arg1, arg2) \
+ case SERVERCALL_ID_##name: \
+ return stringify(name);
+#define HYCLONE_SERVERCALL3(name, arg1, arg2, arg3) \
+ case SERVERCALL_ID_##name: \
+ return stringify(name);
+#define HYCLONE_SERVERCALL4(name, arg1, arg2, arg3, arg4) \
+ case SERVERCALL_ID_##name: \
+ return stringify(name);
+#define HYCLONE_SERVERCALL5(name, arg1, arg2, arg3, arg4, arg5) \
+ case SERVERCALL_ID_##name: \
+ return stringify(name);
+#define HYCLONE_SERVERCALL6(name, arg1, arg2, arg3, arg4, arg5, arg6) \
+ case SERVERCALL_ID_##name: \
+ return stringify(name);
+
+ switch (call_id)
+ {
+#include "servercall_defs.h"
+ }
+
+#undef HYCLONE_SERVERCALL0
+#undef HYCLONE_SERVERCALL1
+#undef HYCLONE_SERVERCALL2
+#undef HYCLONE_SERVERCALL3
+#undef HYCLONE_SERVERCALL4
+#undef HYCLONE_SERVERCALL5
+#undef HYCLONE_SERVERCALL6
+
+ return "(null)";
+}
+
intptr_t server_dispatch(intptr_t conn_id, intptr_t call_id,
intptr_t a1, intptr_t a2, intptr_t a3, intptr_t a4, intptr_t a5, intptr_t a6)
{
+ std::ostringstream logline;
+ logline << "conn_id=" << conn_id << ", call_id=" << stringify_call_id(call_id)
+ << ", a1=" << a1 << ", a2=" << a2 << ", a3=" << a3 << ", a4=" << a4
+ << ", a5=" << a5 << ", a6=" << a6;
+ std::cerr << logline.str() << std::endl;
hserver_context context;
context.conn_id = conn_id;
@@ -39,6 +86,7 @@ intptr_t server_dispatch(intptr_t conn_id, intptr_t call_id,
context.process = system.GetProcess(context.pid).lock();
if (!context.process)
{
+ std::cerr << "!context.process: pid=" << context.pid << std::endl;
return B_BAD_VALUE;
}
}
@@ -51,6 +99,7 @@ intptr_t server_dispatch(intptr_t conn_id, intptr_t call_id,
if (!context.thread)
{
+ std::cerr << "!context.thread tid=" << context.tid << std::endl;
return B_BAD_VALUE;
}
}
diff --git a/hyclone_server/system.cpp b/hyclone_server/system.cpp
index bc00e75..a6558f2 100644
--- a/hyclone_server/system.cpp
+++ b/hyclone_server/system.cpp
@@ -117,8 +117,10 @@ const Connection& System::RegisterConnection(intptr_t conn_id, const Connection&
Connection System::GetThreadFromConnection(intptr_t conn_id)
{
auto it = _connections.find(conn_id);
- if (it == _connections.end())
+ if (it == _connections.end()) {
+ std::cerr << "thread not found for conn_id=" << conn_id << std::endl;
return Connection(-1, -1);
+ }
return it->second;
}
It seems disconnect called twice and second disconnect couldn't find the thread.
conn_id=9, call_id=read_attr, a1=6, a2=141751388281, a3=21, a4=0, a5=140723701457152, a6=256
conn_id=9, call_id=read_attr, a1=6, a2=141751388257, a3=23, a4=0, a5=140723701457148, a6=4
conn_id=9, call_id=unregister_fd, a1=6, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=unregister_fd, a1=5, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=unregister_fd, a1=4, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=read_stat, a1=1, a2=0, a3=0, a4=0, a5=140723701459360, a6=128
conn_id=9, call_id=delete_sem, a1=37, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=delete_sem, a1=34, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=delete_sem, a1=31, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=delete_sem, a1=29, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=delete_sem, a1=27, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=delete_sem, a1=25, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=delete_port, a1=4, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=delete_port, a1=5, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=delete_port, a1=6, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=delete_sem, a1=16, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=disconnect, a1=0, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=disconnect, a1=0, a2=0, a3=0, a4=0, a5=0, a6=0
thread not found for conn_id=9
conn_id=9, call_id=connect, a1=186085, a2=186085, a3=1000, a4=1000, a5=1000, a6=1000
conn_id=9, call_id=wait_for_fork_unlock, a1=0, a2=0, a3=0, a4=0, a5=0, a6=0
conn_id=9, call_id=get_next_area_info, a1=0, a2=140723242013528, a3=140723242013808, a4=140723242013524, a5=0, a6=0
whole log:
I'll do a closer review after my exam.
Oh, please focus your exam. Sorry for disturbing it. I don't hurry at all.
disconnect
being called twice is normal. Probably one is explicitly called by the loader before exec
, the other is dispatched by the server to make sure everything's cleaned up after the connection is closed.
At this point I don't have any other idea what's wrong; maybe try logging the result of all servercalls as well?
At this point I don't have any other idea what's wrong; maybe try logging the result of all servercalls as well?
I got logging with result of all servercalls.
Hello, I wanna try this but it cannot launch Haiku's bash. It gets stucked here.
uname -a
Linux archlinux 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux
hyclone's revision
f590514746427aeb298c139080af24bf6cf1f92a