Closed GoogleCodeExporter closed 9 years ago
gdb backtrace under Tiger:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xbbfffff0
0x07b9caf8 in ?? ()
(gdb) bt full
#0 0x07b9caf8 in ?? ()
No symbol table info available.
#1 0x0305a92c in js::DeepBail ()
No symbol table info available.
#2 0x00000000 in ?? ()
No symbol table info available.
(gdb)
Original comment by barr...@gmail.com
on 22 Aug 2011 at 2:02
gdb backtrace under Leopard:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xbbffffb0
0x0de8f678 in ?? ()
(gdb) bt full
#0 0x0de8f678 in ?? ()
No symbol table info available.
#1 0x0305a92c in js::DeepBail ()
No symbol table info available.
#2 0x00000000 in ?? ()
No symbol table info available.
(gdb)
I've also attached the Crash Reporter logs from both Tiger and Leopard.
(BTW, GPU Accelerated Windows are disabled on my iBook G4 under Leopard too.)
Original comment by barr...@gmail.com
on 22 Aug 2011 at 2:27
Attachments:
This appears to be a stack overflow. I can't do much more investigation yet
until my connectivity is restored to the build system.
Original comment by classi...@floodgap.com
on 22 Aug 2011 at 3:52
This is odd:
1) I cannot reproduce the crash going to the website directly (fresh user
account, no add-ons, no adblock).
2) I can, however, reproduce the crash in the same fresh user account with your
zipped folder (thanks for attaching that), where the browser starts to load a
myriad of ads – many more than I get when visiting the site directly. Maybe
they serve ads based on location?
Original comment by chtru...@web.de
on 23 Aug 2011 at 7:10
It could be that. I can't reproduce it either. Some exact steps would be
helpful.
Original comment by classi...@floodgap.com
on 23 Aug 2011 at 3:19
I gave the steps, as exactly as I think I can, that make it crash 100%
reproducibly for me.
The saved folder does show more ads, for whatever reason. However, aside from
the extra ads, there are no other problems when I open it in Safari 5.0.6. (If
I open it in Firefox 3.6.20, it pops up one or two unresponsive script warnings
while loading, but I just click "Continue" and it eventually finishes loading.
In contrast, the unresponsive script warnings do not show when I visit
www.cringely.com in that same browser.)
If I visit cringely.com from TFF with a fresh profile it still crashes for me
-- I just did it again to make sure.
Steps for saving the folder:
1. Open Firefox 3.6.20.
2. Visit www.cringely.com
3. Choose "Save Page As..." from the File menu.
4. Choose "Web Page, complete" as the format to save in.
5. Choose your desired location.
6. Click "Save."
Except for step 5 in this comment, I think I've made my steps as exact as I
can. (At least, right now I'm unable to figure out how to make them more exact.)
FWIW my ISP is Cox Communications (cable), and I'm in Irvine, CA.
Original comment by barr...@gmail.com
on 24 Aug 2011 at 12:28
With the provided folder, I can cause 6.0 to crash. The problem is that the
folder is not local -- it's actually pulling down .js from other sites, so it
is not a portable test case or minimized.
It is coming from one of the ad sites, so AdBlock does mask this.
Barry, if you can pull everything together into something at minimum portable
-- i.e., no outside dependencies -- it would help considerably because I can't
run this on the debugging system until my network is up. If you can't, this is
going to have to sit until I can.
Original comment by classi...@floodgap.com
on 24 Aug 2011 at 4:18
Test build reducing regs in flight from 18 to 8 does not repair the crash.
Original comment by classi...@floodgap.com
on 24 Aug 2011 at 4:19
I guess we could also restrain native calls like we did in 4.0.1, but that
would hobble us again.
Original comment by classi...@floodgap.com
on 24 Aug 2011 at 4:35
Oooh, I just thought of something. We could relocate the stack up higher and
try to get more than 64MB of memory in the stack. This doesn't solve the crash
per se (because there is no case where we can detect running out of stack and
abort safely -- this is a limitation of OS X), but we could add so much stack
that the issue is dealt with.
This should give us a gig:
-Wl,-stack_size,0x40000000,-stack_addr,0xf0000000
If it crashes with that, then my only other thought is restrain natives and
that's gonna suck.
Original comment by classi...@floodgap.com
on 24 Aug 2011 at 4:48
Can I try this in the terminal with a ulimit command like in bug 37?
Original comment by chtru...@web.de
on 24 Aug 2011 at 5:04
No. ulimit doesn't let you relocate the stack. This has to be done by the
linker.
Original comment by classi...@floodgap.com
on 24 Aug 2011 at 5:05
The biggest help to me would be to get everything into a totally local test
pack, even if it's not minimized, because otherwise I have to sneakernet builds
to and from the iBook and this will really slow the process down. It has to be
a test case I can run without a network on the debug G5.
Original comment by classi...@floodgap.com
on 24 Aug 2011 at 5:07
The website from the folder loads different stuff every time I view it. So I'm
not sure if it's possible to make a completely local version. In Safari, one
can see that it actually never stops loading, but dynamically loads new stuff
about every ten seconds.
On the other hand, TFF (fresh user account) crashes even with no network
connectivity at all, so I figure the offending JS must be already somewhere in
that folder. I'll look more closely tonight when I have more time.
What's weird is that my normal TFF user profile crashes even with full Adblock
plus armour (Easy List English, Easy List Germany, Fanboy's List) when loading
the website from the folder. It does so with or without network connectivity.
There are no ads to be seen before it crashes. But I don't know what exactly
Adblock Plus blocks.
Original comment by chtru...@web.de
on 24 Aug 2011 at 6:28
I'm unable to pinpoint a single .js file that may cause the crash. If I remove
all of them, the browser doesn't crash. If I remove one specific file (viz.,
__c__wbx_lp_js_wbx_159143538_clickTracker__c__.js) the browser doesn't crash
when it's without a network connection. When I'm connected, the browser still
crashes when said file is removed. With a network connection, the only thing I
can say for sure is that the fewer .js files are present, the less likely the
browser will crash.
Original comment by chtru...@web.de
on 24 Aug 2011 at 11:21
I'm a bit busy for the moment, so it might be a few days before I can provide
any more help with this bug.
Original comment by barr...@gmail.com
on 25 Aug 2011 at 12:30
Don't worry about it, Chris is right. The provided dump will crash the browser
even without network.
Adjusted browser/app/Makefile.in to 0x10000000 from 0x4000000, increasing stack
from 64MB to 256MB. The linker relocated the stack automatically to 0xb0000000,
and the test dump no longer crashes. This will appear in the 7 beta when it is
generated.
Original comment by classi...@floodgap.com
on 25 Aug 2011 at 12:47
By the way, in answer to your question in comment 14, Adblock doesn't block
anything from the test dump because it appears to come from file://.
Original comment by classi...@floodgap.com
on 25 Aug 2011 at 12:48
Oh, and fix summary
Original comment by classi...@floodgap.com
on 25 Aug 2011 at 12:49
Very good. But, how much higher in the stack can we go?
(Does this have an impact on the amount of RAM the browser uses? Does it create
security issues to relocate the stack to a different place?)
Original comment by chtru...@web.de
on 25 Aug 2011 at 7:35
Security issues with expanding or relocating the stack, no. In fact it may be
more resistant to attacks assuming the stack is in the default position,
although this is pretty unlikely to occur in this post-PPC world. But in the
worst case it would require more memory during these kinds of peaks, and I
think we're just going to have to accept that as a cost of doing business. For
7 we'll probably need to demand 512MB as minimum, and all of the Macs we
support are capable of that (even an old Outrigger beige Power Mac can hold a
gig).
The theoretical maximum stack size is 4GB for a 32-bit process if we moved the
stack all the way down to 0x00000000. However, we won't get anywhere near that
because 4GB stack means no room for the code itself. In practice I don't think
we'll be able to get much near a gig of stack without the linker objecting.
The real problem is that we have no way to detect that we are running out of
stack until we actually do run out of stack, and this appears to be a flaw in
the way the tracejit is designed. In fairness to Mozilla, it's probably never
been an issue for them in practice, just us. We are more sensitive to this
because we have two "problems": OS X requires a fixed stack cap -- it cannot
dynamically grow above that designed cap -- and PPC has many more registers
than, say, x86. OS X on Intel has the stack cap issue but it's only saving a
handful of registers. We're saving 18 GPRs on each function call, plus the link
register, FPRs, etc. This is demanded by the PPC ABI, and even provisionally
reducing this number doesn't reduce our stack demands enough. SPARC has the
same register problem, but Solaris stacks can grow. However, once we're into
the native code the tracejit spits out, it's too late. PPC on Linux shouldn't
have this problem either because the Linux stack can also grow. In short, it's
unique to 10.4 and 10.5.
Original comment by classi...@floodgap.com
on 25 Aug 2011 at 3:14
s/designed cap/designated cap/
Original comment by classi...@floodgap.com
on 25 Aug 2011 at 3:15
Good. I thought the browser would always use +256 MB from start because I
confused 'allocated' with 'fixed size'. I'm still experiencing
(non-reproducable) crashes at Amazon Login and on Facebook every once in a
while (maybe once in two weeks), so 256 MB will probably minimize these
problems if they're stack overflow related.
Original comment by chtru...@web.de
on 25 Aug 2011 at 8:15
Confirmed fixed in 7.0b1. Browser uses 336 MB of real memory to load the
cringely-crasher local page… The first time I loaded the it, however, the
browser crashed (but I haven't been able to reproduce it since), so maybe 256
MB still isn't enough.
Original comment by chtru...@web.de
on 5 Sep 2011 at 11:03
With 7.0b1, I can't make it crash anymore, with either the cringely-crasher
local page or with cringely.com itself. Thank you.
Original comment by barr...@gmail.com
on 5 Sep 2011 at 2:37
Excellent.
Original comment by classi...@floodgap.com
on 5 Sep 2011 at 2:38
With regard to comment 24 let's see how it goes. I don't want to suck up our
addressing space indiscriminately,
Original comment by classi...@floodgap.com
on 5 Sep 2011 at 2:39
Saving all those registers does not only consume stack space but does also hit
performance.
Currently 18 registers are saved and restored with each fuction call.
One should investigate in how many of those are really ever used - i.e. by
adding simple statistics to the findReg functions (setting a flag bit for each
register used, and printing a log message when a register is used for the first
time) and run dromaeo, for example.
Based on the results we might limit the number of registers that are saved and
restored and hence are allowed to use in nanojit. Might be we only need half of
the registers we currently save and restore?!
Original comment by Tobias.N...@gmail.com
on 12 Sep 2011 at 4:10
The tracer actually already does this, albeit indirectly -- a number of
activation slots are identified, and this can be as few or as many as are in
flight (see genPrologue in NativePPC.cpp). The problem is that we can stack up
a lot at peak, and we also have to account for the linkage area.
Original comment by classi...@floodgap.com
on 12 Sep 2011 at 6:50
So what I thought of (saving the GPRs 13-31 if actually used) is done by
evictScratchRegsExcept() which is called by asm_call() if I got it right?
I hope that actually works - did you check that it indeed does?
Original comment by Tobias.N...@gmail.com
on 12 Sep 2011 at 9:26
I don't follow your question, but evictScratchRegsExcept() doesn't do that;
it's called *after* a function call is made (the callee should have saved the
registers already) to indicate registers in flight are stale. Remember, the
nanojit emits from the top down, so operations are actually represented
backwards -- the evicter emits code which runs *after* the bctrl in execution.
Register saving is generally done in the prologue, which in the nanojit is
indirectly triggered by LIR_spill.
Original comment by classi...@floodgap.com
on 13 Sep 2011 at 5:41
Original issue reported on code.google.com by
barr...@gmail.com
on 22 Aug 2011 at 1:16Attachments: