olealgoritme / gddr6

Linux​ based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.
79 stars 30 forks source link

segmentation fault / sigsegv error #1

Closed m4gr4th34 closed 1 year ago

m4gr4th34 commented 1 year ago

Hi, thanks for working on this! I compile using gcc gddr6.c -o gddr6 -lpci

no errors, but when i run ./gddr6, i get Segmentation fault. I followed instructions for apt install, and updated grub and rebooted (twice).

m4gr4th34 commented 1 year ago

Not sure what happened, but now I get a different error. I'm using RTX 3090, and I didn't change anything else in the system. When inside su I get the same error above. But when using sudo (or no sudo) I get this: terminated by signal SIGSEGV (Address boundary error)

olealgoritme commented 1 year ago

@m4gr4th34 Are you running a single NVIDIA RTX 3000/4000 series GPU or multiple? Please provide the output of the following, it prints the PCIe bar0 address(es).

lspci -v -d 10de: | grep 'size=16M' | awk '{print $3}'
m4gr4th34 commented 1 year ago

I get this output: fa000000 Running a single RTX 3090 GPU in pcie slot 1, as usual. I also have a second one (not plugged in yet), so would be nice if it worked for dual gpu setup.

olealgoritme commented 1 year ago

@m4gr4th34 I pushed a change to add that specific bar0. Pull and retry.

m4gr4th34 commented 1 year ago

Thanks! Tried it, double triple checked folder, recompiled, checked code, and the entry seems to be correct for bar0, but I still get: Segmentation fault

olealgoritme commented 1 year ago

It can be a bit tricky. I removed the bar0 from the table in recent fix, but probably won't change anything. Try sudo gdb ./gddr6 and run it inside the debugger (print 'run', press enter)

m4gr4th34 commented 1 year ago

I ran the second iteration of the code and received this error: Program received signal SIGSEGV, Segmentation fault. 0x0000555555555483 in pci_detect_dev ()

I will try the latest iteration now

m4gr4th34 commented 1 year ago

Alright, it works in debug: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Device: RTX 3090 (GA102 / 0x2204) GDDR6X VRAM Temp: 56°c

and it works in: sudo ./gddr6

this is very sweet, thanks!

olealgoritme commented 1 year ago

Awesome! Glad it worked for you.