trexminer / T-Rex

T-Rex NVIDIA GPU miner with web control monitoring page
2.64k stars 439 forks source link

can't initialize T-Rex, assign: Bad file descriptor #1171

Open zhanko73 opened 2 years ago

zhanko73 commented 2 years ago

ERROR: WATCHDOG: can't initialize T-Rex, assign: Bad file descriptor

0.25.8 and 0.25.9 I used to run in container the 0.25.8 then updated docker (most recent fedora35) and no longer works.

I attempted to check the root cause but t-rex does not allow strace so please let me know what can be the issue.

FYI: ethminer or the default cuda test (eg: running "nvidia-smi" inside the container) works fine. Only t-rex has issue since update.

zhanko73 commented 2 years ago

@trexminer I saw that other problems were already answered, however there is no answer so far for this. Could you please share a binaay version that allows run under strace or give me a solution for this or feel free to publish source code? There are lot of chance to go forward. Of course there are tricks to use strace but I would prefer a solution or some kind of support to solve this. Yes, that is not excluded the problem is outside of t-rex, however, currently this is the ONLY binary that has issue and somehow I should start to investigate. Thanks for the help in advance.

trexminer commented 2 years ago

We can't share such binary, sorry. Running the miner with --no-watchdog might help the problem appears to lie in watchdog <-> miner communication.

zhanko73 commented 2 years ago

We can't share such binary, sorry. Running the miner with --no-watchdog might help the problem appears to lie in watchdog <-> miner communication.

Thank you. That's look like solved the issue. I have just read what does the watchdog do. Do you have idea what should I check to investigate the root cause?

trexminer commented 2 years ago

Do you have idea what should I check to investigate the root cause?

Unfortunately, no, I don't know why it could occur

zhanko73 commented 2 years ago

I do not know as well. However I hope you have some idea how the watchdog part was setup, what does it check, which calls it has etc. If you provide that information I can check from OS side the rest. What does this watchdog part do exactly? What can happen if I keep it disabled?