sgminer-dev / sgminer

Scrypt GPU miner
GNU General Public License v3.0
631 stars 825 forks source link

Segfaults due to missing include and comparing string to NULL (WAS: Segfault on Ubuntu 13.10) #123

Closed anhel closed 10 years ago

anhel commented 10 years ago

Fresh Ubuntu 13.10 x64, Ati drivers 13.10, SDK 2.9 After successfull compilation i run : export DISPLAY=:0 && export GPU_USE_SYNC_OBJECTS=1 && export GPU_MAX_ALLOC_PERCENT=100 &&./sgminer -o http://192.168.1.9:9327 -u a -p a

Result is - [21:53:03] Started sgminer 4.1.0-135-g42806Segmentation fault (core dumped)

Here is bt:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffecb7f700 (LWP 6155)]
0x00007ffff687bf90 in _IO_vfprintf_internal (s=s@entry=0x7fffecb7c600, format=<optimized out>, format@entry=0x460ed6 "%s difficulty changed to %.1f",
    ap=ap@entry=0x7fffecb7c778) at vfprintf.c:1655
1655    vfprintf.c: No such file or directory.
(gdb) bt
#0  0x00007ffff687bf90 in _IO_vfprintf_internal (s=s@entry=0x7fffecb7c600, format=<optimized out>, format@entry=0x460ed6 "%s difficulty changed to %.1f",
    ap=ap@entry=0x7fffecb7c778) at vfprintf.c:1655
#1  0x00007ffff68a25a5 in _IO_vsnprintf (string=0x7fffecb7c790 "", maxlen=<optimized out>, format=0x460ed6 "%s difficulty changed to %.1f",
    args=0x7fffecb7c778) at vsnprintf.c:119
#2  0x000000000043b777 in applog (prio=5, fmt=0x460ed6 "%s difficulty changed to %.1f") at logging.c:59
#3  0x0000000000437538 in parse_diff (pool=0x699dc0, val=0x7fffd8009b40) at util.c:1659
#4  0x0000000000437c77 in parse_method (pool=0x699dc0,
    s=0x7fffd8014810 "{\"params\": [43.76170711708255], \"jsonrpc\": \"2.0\", \"method\": \"mining.set_difficulty\", \"id\": 17840769}") at util.c:1798
#5  0x0000000000437eea in auth_stratum (pool=0x699dc0) at util.c:1843
#6  0x000000000041aa75 in pool_active (pool=0x699dc0, pinging=false) at sgminer.c:5581
#7  0x00000000004217c5 in test_pool_thread (arg=0x699dc0) at sgminer.c:7350
#8  0x00007ffff6bfef6e in start_thread (arg=0x7fffecb7f700) at pthread_create.c:311
#9  0x00007ffff69299cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Have to point that same segfault is on debian too. Imo something wrong with output.

veox commented 10 years ago

From the IRC, seems to be an issue for Debian (and all derivatives).

Cannot reproduce on Arch Linux, using glibc 2.19. Debian 7 (Wheezy) has 2.13 ATM.

gotqn commented 10 years ago

I believe for Ubuntu 13.10 you should use "AMD Catalyst™ 14.2 LINUX Beta Driver" since it is supporting "Ubuntu 12.04.3 and 13.10".

Note, that this diver is still in BETA and people are reporting issues with it. I have decided to use instead "AMD Catalyst™ 13.12 LINUX" but it is supported for "Ubuntu 12.04.2 and 13.04". Since Ubuntu 13.04 is not supported already, you need to install "Ubuntu 12.04.2".

These are the links to the drivers notes (you can check if your GPUs are supported too):

http://support.amd.com/en-us/kb-articles/Pages/amdcatalyst13-12linreleasenotes.aspx http://support.amd.com/en-us/kb-articles/Pages/Latest-LINUX-Beta-Driver.aspx

Here http://askubuntu.com/q/422052/95857 you can find more information about "How to download and install specific Ubuntu version and disable future builds?"

Note, that some folks believe that the "AMD Catalyst™ 13.12 LINUX" driver should work with "Ubuntu 12.04.04" but I have not yet tested this. You can try it and share what happens.

anhel commented 10 years ago

Its a code bug. Tried catalyst 14 too.

veox commented 10 years ago

Try git reset --hard 8ff624618f46da7f04f82a22fb3a1a83c976f595 and a complete rebuild. I'm pretty sure it's related to recent incognito/poolname changes.

If that helps, I can only recommend doing

make clean
git reset --hard HEAD~1
autoreconf -fi
CFLAGS="-ggdb -Wall -march=native" ./configure
make

repeatedly, to find which commit introduced this. Granted, the commits are pretty messy.

veox commented 10 years ago

Confirmed by matje on IRC this does not happen if reset to commit above, so an issue with those commits.

Possibly related: glibc bug 6530, said to be fixed in glibc 2.17, however matje says he's using Gentoo with glibc 2.17 and it's still present.

veox commented 10 years ago

What happens if running with -TDv? Verbatim output, please.

veox commented 10 years ago

Also, just pushed two commits that shouldn't have any effect on resolution, but please check.

EDIT: matje reports it did fix the issue for him, so do check.

veox commented 10 years ago

Change compare that caused this (for reference): https://github.com/veox/sgminer/compare/8ff62461...ded44523163bbc53a5c8a368ada2f5bfd43aeb98

melt7777 commented 10 years ago

3.7.2, 4.0, and 4.1 all seg fault on the new debian build....... if i use any but the latest 2 catalyst drivers. On Debian 7.4 live-usb w/ persistence, latest apt-get upgrades, latest AMD APP, and ADL 6.

veox commented 10 years ago

Can anyone confirm this is still present/relevant?

@melt7777 if even cgminer 3.7.2 segfaults, then the issue is deeper than I thought and needs a lot more debugging.

melt7777 commented 10 years ago

Well... it's debian wheezy driver issues mostly... Hope this is useful info.... Latest attempt, Today's debian wheezy custom livecd with main contrib non-free updates and upgrades, latest git clone of sgminer master. 13.12 won't start X, ssh in and running sgminer -n or any other options just hangs on the "Starting sgminer" screen. 12.8 cat drivers, machine won't start X, full freeze with no ssh in. Severe driver issues that I can't seem to work out... So most likely not sgminer's fault?

NNygren commented 10 years ago

I've just confirmed the latest build segfaults (core dump) on my Arch box, even launching with no .config file results in the same. For now I'm just using an older version of Troky's nscrypt sgminer.

henriknacka commented 10 years ago

I can also confirm it is now segfaults on my two Arch boxes as well after update (4.1.153-54-g1b3d), will try to dig deeper later.

Belgarion commented 10 years ago

Something seems broken with cgpu->kernelname in the latest version (4.1.153-54-g1b3d). It stopped segfaulting after adding

if (cgpu->kernelname == NULL) {
    cgpu->kernelname = strdup("ckolivas");
}

before the line: if(strcmp(cgpu->kernelname, "") == 0) { in ocl.c

It seems like cgpu->kernelname should have already been set from the opencl_thread_prepare function in driver-opencl.c, but it seems to not have done that.

yinhm commented 10 years ago

I track this issue to the commit 1333ed576db5a875dc18bc1b311fee03ae602bdf

Also @Belgarion change fix this.

It happens when you don't have "kernel" in the config.

cloudrck commented 10 years ago

When I simply add a kernel parameter as @yinhm suggested, it works fine on Ubuntu 13.10.

melt7777 commented 10 years ago

Confirmed this issue resolved on all sgminer variants including latest git, by including "kernel" : "scrypt" in the .conf file. I also had to copy kernel/ckolivas.cl to kernel/scrypt.cl ... No source modification necessary Using debian wheezy 7.4 latest amd64, catalyst 13.12, app 2.9, adl 6, and sgminer latest from today. Thank you guys for your help!

veox commented 10 years ago

It seems like cgpu->kernelname should have already been set from the opencl_thread_prepare function in driver-opencl.c, but it seems to not have done that.

That's not the kernel name being set there, that's the device platform name.

Thanks for checking this edge condition.

@melt7777 you should use "kernel": "ckolivas", that'll save you the copying.

P.S. This is different from what this issue was originally reported for. I'll commit the fix and close it.

hazelybell commented 10 years ago

When I simply add a kernel parameter as @yinhm suggested, it works fine on Ubuntu 13.10.

Same on debian testing.

melt7777 commented 10 years ago

Thanks guys, you really took care of us with your help!

jagauthier commented 10 years ago

I pulled the git this morning, and this is still seg faulting for me, exactly as described above. Any additional suggestions?

veox commented 10 years ago

@jagauthier Yes: Run a debug version in gdb (or similar) and report a bugtrace in a new issue report. See doc/BUGS.md.