mavak / linux-track

Automatically exported from code.google.com/p/linux-track
MIT License
0 stars 0 forks source link

Cannot run ltr_gui -- Illegal Instruction (Core Dumped) #74

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Install linuxtrack via Arch Linux AUR, or via universal package. 
2. Attempt to run ltr_gui.
3. Ltr_gui doesn't start, only reports "Illegal Instruction (core dumped)."

--What is the expected output? What do you see instead?

Expected output would be the linuxtrack GUI starting. Instead, the terminal 
outputs "Illegal Instruction (core dumped)."

--What version of the product are you using? On what operating system?

Linuxtrack version 0.99.11 64-bit package on Arch Linux 64-bit. 

--Please provide any additional information below.

I have attempted using all available different packages on the Arch Linux AUR. 
When I couldn't get any of those working I then attempted a manual install 
using the 64-bit universal package and following the installation instructions 
in this site's wiki, only to have the same result. 

Original issue reported on code.google.com by jamesave...@gmail.com on 23 Oct 2014 at 10:09

GoogleCodeExporter commented 9 years ago
I can confirm this behavior. The stack trace seems to crash after not finding 
linuxtrack1.conf.

Interestingly enough, it also seems to crash if it cannot modify one (or more?) 
of the log files in /tmp/linuxtrack0*.log.  Sadly I deleted the strace output 
that showed that one.

Original comment by kscott.t...@gmail.com on 24 Oct 2014 at 5:22

Attachments:

GoogleCodeExporter commented 9 years ago
One more trace that might be helpful.  I created an empty 
~/.config/linuxtrack/linuxtrack1.conf
And then ran ltr_gui again, and this was the result.

Original comment by kscott.t...@gmail.com on 24 Oct 2014 at 5:28

GoogleCodeExporter commented 9 years ago
Sorry for the spam, but my attachment disappeared when I submitted.  Odd.  Take 
2.

Original comment by kscott.t...@gmail.com on 24 Oct 2014 at 5:29

Attachments:

GoogleCodeExporter commented 9 years ago
Thank you, your stack traces lead me to get this working.

All I did was create an empty file ~/.config/linuxtrack/linuxtrack1.conf

Once attempting to run the program after doing so, the program still crashed 
with the same error. However, it created a file 
~/.config/linuxtrack/linuxtrack1.conf.new with some data in it.

I deleted the empty file I created and renamed linuxtrack1.conf.new to 
linuxtrack1.conf and attempted to run ltr_gui again, and it worked properly.

Original comment by jamesave...@gmail.com on 24 Oct 2014 at 6:17

GoogleCodeExporter commented 9 years ago
I should also note that every time you change any preferences and try to save, 
the program will crash. You'll have to rotate out the linuxtrack1.conf.new file 
each time.

Original comment by jamesave...@gmail.com on 24 Oct 2014 at 6:47

GoogleCodeExporter commented 9 years ago
Hello guys,
may I ask you what processor do you have? I have seen such behavior on old 
Athlon processor before (Athlon 2800XP+) - openCV used SSE3 instructions and 
this processor doesn't know them; but I'm afraid this is something else...

Quick check on my old Athlon didn't reveal any problems.

Could you please try to obtain a gdb backtrace of the problem along with the 
disassembly of the offending function?

Should be roughly along the lines of this:

gdb ltr_gui

>set pagination off

>set disassembly-flavor intel

>run

(when the debugger breaks)

>bt

>disass

>kill

>quit

The thing is, it is strange you get the same behavior with both the universal 
package and when you compile it (when using AUR you compile it, right?)...

Kind regards,

Michal

Original comment by f.jo...@email.cz on 24 Oct 2014 at 8:16

GoogleCodeExporter commented 9 years ago
I am using an AMD FX-6100 (64-bit).

I attached a text file of the terminal output during the backtrace.

And you are correct, compiling is part of the process when installing anything 
from AUR.

As a side note, I have '-march=native' set in my compile flags.

Original comment by jamesave...@gmail.com on 24 Oct 2014 at 9:00

Attachments:

GoogleCodeExporter commented 9 years ago
Thank you for such prompt reply...

Hmmm... Looking at the gdb dump, the problem is in the function 
pthread_rwlock_unlock from /usr/lib/libpthread.so.0. The offending instruction 
is xend, which according to the wiki comes from transactional synchronization 
extension, which should be included in Haswell (and newer) processors and as 
far as I can tell, AMD processors doesn't have it at all...

So the question is, with which flags was the libpthread compiled? Anyway, it 
seems that you'll have to get it recompiled with the flags appropriate to your 
processor...

Please let me know if it helped...
Kind regards,

Michal

Original comment by f.jo...@email.cz on 24 Oct 2014 at 9:26

GoogleCodeExporter commented 9 years ago
I don't know the method used to compile libpthread.. It's part of glibc, which 
is a core package...

Do you think that compiling glibc on my own might change anything? There is a 
glibc-git in the AUR which I could use to do some quick testing, maybe.

Original comment by jamesave...@gmail.com on 24 Oct 2014 at 9:36

GoogleCodeExporter commented 9 years ago
Unfortunately I don't know Arch that much, but googling around, it seems there 
was a problem upstream, leaking there this instruction even when it should have 
been disabled (https://lists.debian.org/debian-glibc/2014/09/msg00076.html)...

Also looking here 
(https://projects.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=pac
kages/lib32-glibc) - not sure if I'm looking at the correct place - seem to 
have added --enable-lock-elision option unconditionally, which might be the 
reason for this bug...

Kind regards,

Michal

Original comment by f.jo...@email.cz on 24 Oct 2014 at 10:47

GoogleCodeExporter commented 9 years ago
My processor is a Core i7 920.  That predates the Haswell core (it was Nehalem, 
I think), so xend may not be on it either.

My gdb is attached as well, though it looks similar to James'

Original comment by kscott.t...@gmail.com on 26 Oct 2014 at 2:11

Attachments:

GoogleCodeExporter commented 9 years ago
Hello,
it seems to be exactly the same problem...

Unfortunately there is not much that I can do about it - this problem most 
probably needs to be solved in the upstream or it might be possible to work 
around in the AUR...
Can you try to report it to the package maintainer? If you add your gdb 
backtrace, it should provide him with all the information on this problem...

Kind regards,

Michal

PS. Funny thing is, that even Intel seem to step back as far as the TSX is 
concerned - from the news it seems that it is somewhat broken and fixed 
processors will be available next year...

Original comment by f.jo...@email.cz on 26 Oct 2014 at 8:29

GoogleCodeExporter commented 9 years ago
By default archlinux packages are compiled using -march=x86_64

Original comment by hectores...@gmail.com on 27 Oct 2014 at 6:32

GoogleCodeExporter commented 9 years ago
I don't know Arch, so I assume you are right; just googling around I found 
this: 

https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packa
ges/glibc

There you can see '--enable-lock-elision' in the configure step; assuming it 
checks if the TSX instructions are available and disables itself on the 
processor without TSX, there were those reports:

https://lists.debian.org/debian-glibc/2014/09/msg00076.html

So my assumption is that this is the source of problems you saw...
But then again, I might be completely off...

Kind regards,

Michal

Original comment by f.jo...@email.cz on 27 Oct 2014 at 8:17

GoogleCodeExporter commented 9 years ago
I never actually got this resolved. This issue seems to prevent ltr_gui from 
creating or overwriting the ~/.config/linuxtrack/linuxtrack1.conf file. 
However, it is still able to create and overwrite the 
~/.config/linuxtrack/linuxtrack1.conf.new file. Therefore, I have simply been 
deleting linuxtrack1.conf and renaming linuxtrack1.conf.new each time I need to 
change configurations.

I did not test building glibc myself.

Original comment by jamesave...@gmail.com on 28 Oct 2014 at 7:20

GoogleCodeExporter commented 9 years ago
I want to file a bug report on the Arch bug tracker about this issue with glibc 
but, to be honest, I don't think that I understand the problem well enough to 
create a useful bug report.. 

What is it about the program that lets it be able to create/overwrite 
"linuxtrack1.conf.new" but not create/overwrite "linuxtrack1.conf"? Why can't 
it handle the .conf file the same way it handles the .conf.new file? 

Original comment by jamesave...@gmail.com on 28 Oct 2014 at 10:47

GoogleCodeExporter commented 9 years ago
The bug seems to be triggered, when you try to unlock a reader/writer lock, 
that is not locked (or in my case locked/unlocked twice). The pseudocode would 
be like this:

{{{
pthread_rwlock_wrlock
  pthread_rwlock_wrlock
    //do some writing
  pthread_rwlock_unlock //here the writer lock is actually unlocked
pthread_rwlock_unlock //here the lock free already - triggers the bug!
}}}

The same behavior would be trigger also by just calling pthread_rwlock_unlock 
by itself on free lock. Anyway, it seems I should be able to work around this 
bug afterall; in the evening I'll post a link to fixed package. Be it as it 
may, I would still report it as a glibc bug...

When reporting the bug ot the AUR maintainer, I'd report something like this:

Function pthread_rwlock_unlock contains instruction from TSX extension even on 
the processors that doesn't support it and the code path is reachable by 
calling it on a rwlock that is unlocked.

{{{
Relevant part of gdb dump:

Dump of assembler code for function pthread_rwlock_unlock:
   0x00007ffff358e3c0 <+0>: mov    r8,rdi
   0x00007ffff358e3c3 <+3>: mov    edi,DWORD PTR [rdi+0x18]
   0x00007ffff358e3c6 <+6>: test   edi,edi
   0x00007ffff358e3c8 <+8>: jne    0x7ffff358e3e0 <pthread_rwlock_unlock+32>
   0x00007ffff358e3ca <+10>:    mov    esi,DWORD PTR [r8+0x4]
   0x00007ffff358e3ce <+14>:    test   esi,esi
   0x00007ffff358e3d0 <+16>:    jne    0x7ffff358e3e0 <pthread_rwlock_unlock+32>
=> 0x00007ffff358e3d2 <+18>:    xend   
   0x00007ffff358e3d5 <+21>:    xor    eax,eax
   0x00007ffff358e3d7 <+23>:    ret    

Relevant part of source:

//from glibc-2.20/sysdeps/x86/elide.h
#define ELIDE_UNLOCK(is_lock_free)      \
  ({                        \
  int ret = 0;                  \
  if (is_lock_free)             \
    {                       \
      _xend ();                 \
      ret = 1;                  \
    }                       \
  ret;                      \
  })

//from glibc-2.20/nptl/pthread_rwlock_unlock.c
  if (ELIDE_UNLOCK (rwlock->__data.__writer == 0
            && rwlock->__data.__nr_readers == 0))
    return 0;

}}}

Original comment by f.jo...@email.cz on 29 Oct 2014 at 6:50

GoogleCodeExporter commented 9 years ago
I did create a bug report earlier tonight on the Arch bugtracker about this. 
For your reference (if you're interested), a link to the report is here: 
https://bugs.archlinux.org/task/42591

I did link to this thread in the bug report, in case there was any information 
the person assigned to the bug could glean from looking at our conversation 
(because I admittedly did a poor job describing the situation).

Original comment by jamesave...@gmail.com on 29 Oct 2014 at 7:01

GoogleCodeExporter commented 9 years ago
Hello,
I just uploaded a new test packages; you can download them here:

http://linuxtrack.eu/test/linuxtrack-0.99.11_1-64.zip

http://linuxtrack.eu/test/linuxtrack-0.99.11_1-32.zip

Hopefully both the sigill problem and the rough ltr_pipe should be gone.
Please let me know how it works...
Kind regards,

Michal

PS. I added the details to your bug report, so hopefully maintainer will have 
enough information to do something about it.

Original comment by f.jo...@email.cz on 29 Oct 2014 at 8:39

GoogleCodeExporter commented 9 years ago
Yes, the problems are gone now. The program works perfectly fine! I can change 
settings and save them properly. The linuxtrack1.conf file updates properly.

Thank you for your hard work!

------

Also, in your source code, the file src/wine_bridge/ltr_wine64.nsi.in has an 
error that causes build failures. I keep having to edit the file to build 
properly.

Line 16 reads:
  File /oname=Controller.exe controller/Controller.exe.so

It should read:
  File /oname=Controller.exe controller\Controller.exe.so

Original comment by jamesave...@gmail.com on 30 Oct 2014 at 6:06

GoogleCodeExporter commented 9 years ago
Thank you, I'm glad it is working well now...

As for the problem with ltr_wine64.nsi, it is funny, the line is there from the 
first revision and it didn't cause any problems so far... But of course I'm 
going to fix it - thank you for spotting that.

Kind regards,

Michal

Original comment by f.jo...@email.cz on 30 Oct 2014 at 6:47

GoogleCodeExporter commented 9 years ago
Hello,
Linuxtrack 0.99.12 is up now, so this problem should not bother you anymore...

I'm closing this issue, feel free to reopen it, should the problem reappear.
Kind regards,

Michal

Original comment by f.jo...@email.cz on 18 Nov 2014 at 6:06