keeleysam / tenfourfox

Automatically exported from code.google.com/p/tenfourfox
0 stars 0 forks source link

Need 256MB of stack for tracejit [cringely.com] #85

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
* I read everything above and have demonstrated this bug only occurs on
10.4Fx by testing against this official version of Firefox 4/up (not
applicable for startup failure) - SPECIFY VERSION YOU TESTED AGAINST:
Firefox 6.0 on Intel Mac, Snow Leopard 10.6.8.

* Layout bugs MUST be tested against a system WITHOUT hardware
acceleration. Go to Help, Troubleshooting Information to see if your test
system is accelerated. If your test system IS accelerated, you must make
sure it is OFF, OR test on ANOTHER system that isn't. If this is NOT a
layout or display bug, you can skip this step.

I haven't checked under Leopard yet, but under Tiger: "GPU Accelerated 
Windows0/1. Blocked for your graphics driver version."

* This is a startup crash or failure to start (Y/N): N

* This is a fault of JavaScript acceleration (Y/N): (Use the steps above to
find out. Do NOT report if it is not.) Yes (disabling 
javascript.options.tracejit.content prevents this crash from happening).

* What steps are necessary to reproduce the bug? These must be reasonably
reliable.

1. Make sure that JavaScript (and javascript.options.tracejit.content) is 
enabled.
2. Visit www.cringely.com (or just cringely.com, it happens either way).
3. Watch the browser crash.

I used Firefox 3.6.20 to save a copy of the crashing page. I will zip it up and 
attach it to this bug. Opening the local copy also reproduces this bug.

* Describe your processor, computer, operating system and any special
things about your environment.

1.2GHz iBook G4 (Late 2004), 1.25GB RAM. Crash is reproducible under both Tiger 
10.4.11 and Leopard 10.5.8. Under Leopard I can always reproduce on a freshly 
logged-in guest account (therefore a fresh profile and no Firefox extensions).

* If this is a startup crash or failure to start, have you tried restarting
with a clean profile? Does this fix the problem?

A clean profile does not fix the problem (I tried under both Tiger and Leopard).

* For crashes or startup failure (optional): paste in any information from
Crash Reporter, or if you are able to run 10.4Fx in gdb, paste in the
backtrace. You can often do it like this from within Terminal.app:

cd appname.app/Contents/MacOS
gdb firefox-bin
run
*crash the app
bt full

I will follow up with crash logs later today.

Original issue reported on code.google.com by barr...@gmail.com on 22 Aug 2011 at 1:16

Attachments:

GoogleCodeExporter commented 9 years ago
gdb backtrace under Tiger:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xbbfffff0
0x07b9caf8 in ?? ()
(gdb) bt full
#0  0x07b9caf8 in ?? ()
No symbol table info available.
#1  0x0305a92c in js::DeepBail ()
No symbol table info available.
#2  0x00000000 in ?? ()
No symbol table info available.
(gdb) 

Original comment by barr...@gmail.com on 22 Aug 2011 at 2:02

GoogleCodeExporter commented 9 years ago
gdb backtrace under Leopard:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xbbffffb0
0x0de8f678 in ?? ()
(gdb) bt full
#0  0x0de8f678 in ?? ()
No symbol table info available.
#1  0x0305a92c in js::DeepBail ()
No symbol table info available.
#2  0x00000000 in ?? ()
No symbol table info available.
(gdb) 

I've also attached the Crash Reporter logs from both Tiger and Leopard.

(BTW, GPU Accelerated Windows are disabled on my iBook G4 under Leopard too.)

Original comment by barr...@gmail.com on 22 Aug 2011 at 2:27

Attachments:

GoogleCodeExporter commented 9 years ago
This appears to be a stack overflow. I can't do much more investigation yet 
until my connectivity is restored to the build system.

Original comment by classi...@floodgap.com on 22 Aug 2011 at 3:52

GoogleCodeExporter commented 9 years ago
This is odd: 
1) I cannot reproduce the crash going to the website directly (fresh user 
account, no add-ons, no adblock). 
2) I can, however, reproduce the crash in the same fresh user account with your 
zipped folder (thanks for attaching that), where the browser starts to load a 
myriad of ads – many more than I get when visiting the site directly. Maybe 
they serve ads based on location?

Original comment by chtru...@web.de on 23 Aug 2011 at 7:10

GoogleCodeExporter commented 9 years ago
It could be that. I can't reproduce it either. Some exact steps would be 
helpful.

Original comment by classi...@floodgap.com on 23 Aug 2011 at 3:19

GoogleCodeExporter commented 9 years ago
I gave the steps, as exactly as I think I can, that make it crash 100% 
reproducibly for me.

The saved folder does show more ads, for whatever reason. However, aside from 
the extra ads, there are no other problems when I open it in Safari 5.0.6. (If 
I open it in Firefox 3.6.20, it pops up one or two unresponsive script warnings 
while loading, but I just click "Continue" and it eventually finishes loading. 
In contrast, the unresponsive script warnings do not show when I visit 
www.cringely.com in that same browser.)

If I visit cringely.com from TFF with a fresh profile it still crashes for me 
-- I just did it again to make sure.

Steps for saving the folder:
1. Open Firefox 3.6.20.
2. Visit www.cringely.com
3. Choose "Save Page As..." from the File menu.
4. Choose "Web Page, complete" as the format to save in.
5. Choose your desired location.
6. Click "Save."

Except for step 5 in this comment, I think I've made my steps as exact as I 
can. (At least, right now I'm unable to figure out how to make them more exact.)

FWIW my ISP is Cox Communications (cable), and I'm in Irvine, CA. 

Original comment by barr...@gmail.com on 24 Aug 2011 at 12:28

GoogleCodeExporter commented 9 years ago
With the provided folder, I can cause 6.0 to crash. The problem is that the 
folder is not local -- it's actually pulling down .js from other sites, so it 
is not a portable test case or minimized.

It is coming from one of the ad sites, so AdBlock does mask this.

Barry, if you can pull everything together into something at minimum portable 
-- i.e., no outside dependencies -- it would help considerably because I can't 
run this on the debugging system until my network is up. If you can't, this is 
going to have to sit until I can.

Original comment by classi...@floodgap.com on 24 Aug 2011 at 4:18

GoogleCodeExporter commented 9 years ago
Test build reducing regs in flight from 18 to 8 does not repair the crash.

Original comment by classi...@floodgap.com on 24 Aug 2011 at 4:19

GoogleCodeExporter commented 9 years ago
I guess we could also restrain native calls like we did in 4.0.1, but that 
would hobble us again.

Original comment by classi...@floodgap.com on 24 Aug 2011 at 4:35

GoogleCodeExporter commented 9 years ago
Oooh, I just thought of something. We could relocate the stack up higher and 
try to get more than 64MB of memory in the stack. This doesn't solve the crash 
per se (because there is no case where we can detect running out of stack and 
abort safely -- this is a limitation of OS X), but we could add so much stack 
that the issue is dealt with.

This should give us a gig:
-Wl,-stack_size,0x40000000,-stack_addr,0xf0000000

If it crashes with that, then my only other thought is restrain natives and 
that's gonna suck.

Original comment by classi...@floodgap.com on 24 Aug 2011 at 4:48

GoogleCodeExporter commented 9 years ago
Can I try this in the terminal with a ulimit command like in bug 37?

Original comment by chtru...@web.de on 24 Aug 2011 at 5:04

GoogleCodeExporter commented 9 years ago
No. ulimit doesn't let you relocate the stack. This has to be done by the 
linker.

Original comment by classi...@floodgap.com on 24 Aug 2011 at 5:05

GoogleCodeExporter commented 9 years ago
The biggest help to me would be to get everything into a totally local test 
pack, even if it's not minimized, because otherwise I have to sneakernet builds 
to and from the iBook and this will really slow the process down. It has to be 
a test case I can run without a network on the debug G5.

Original comment by classi...@floodgap.com on 24 Aug 2011 at 5:07

GoogleCodeExporter commented 9 years ago
The website from the folder loads different stuff every time I view it. So I'm 
not sure if it's possible to make a completely local version. In Safari, one 
can see that it actually never stops loading, but dynamically loads new stuff 
about every ten seconds. 

On the other hand, TFF (fresh user account) crashes even with no network 
connectivity at all, so I figure the offending JS must be already somewhere in 
that folder. I'll look more closely tonight when I have more time.

What's weird is that my normal TFF user profile crashes even with full Adblock 
plus armour (Easy List English, Easy List Germany, Fanboy's List) when loading 
the website from the folder. It does so with or without network connectivity. 
There are no ads to be seen before it crashes. But I don't know what exactly 
Adblock Plus blocks.

Original comment by chtru...@web.de on 24 Aug 2011 at 6:28

GoogleCodeExporter commented 9 years ago
I'm unable to pinpoint a single .js file that may cause the crash. If I remove 
all of them, the browser doesn't crash. If I remove one specific file (viz., 
__c__wbx_lp_js_wbx_159143538_clickTracker__c__.js) the browser doesn't crash 
when it's without a network connection. When I'm connected, the browser still 
crashes when said file is removed. With a network connection, the only thing I 
can say for sure is that the fewer .js files are present, the less likely the 
browser will crash. 

Original comment by chtru...@web.de on 24 Aug 2011 at 11:21

GoogleCodeExporter commented 9 years ago
I'm a bit busy for the moment, so it might be a few days before I can provide 
any more help with this bug.

Original comment by barr...@gmail.com on 25 Aug 2011 at 12:30

GoogleCodeExporter commented 9 years ago
Don't worry about it, Chris is right. The provided dump will crash the browser 
even without network.

Adjusted browser/app/Makefile.in to 0x10000000 from 0x4000000, increasing stack 
from 64MB to 256MB. The linker relocated the stack automatically to 0xb0000000, 
and the test dump no longer crashes. This will appear in the 7 beta when it is 
generated.

Original comment by classi...@floodgap.com on 25 Aug 2011 at 12:47

GoogleCodeExporter commented 9 years ago
By the way, in answer to your question in comment 14, Adblock doesn't block 
anything from the test dump because it appears to come from file://.

Original comment by classi...@floodgap.com on 25 Aug 2011 at 12:48

GoogleCodeExporter commented 9 years ago
Oh, and fix summary

Original comment by classi...@floodgap.com on 25 Aug 2011 at 12:49

GoogleCodeExporter commented 9 years ago
Very good. But, how much higher in the stack can we go? 
(Does this have an impact on the amount of RAM the browser uses? Does it create 
security issues to relocate the stack to a different place?)

Original comment by chtru...@web.de on 25 Aug 2011 at 7:35

GoogleCodeExporter commented 9 years ago
Security issues with expanding or relocating the stack, no. In fact it may be 
more resistant to attacks assuming the stack is in the default position, 
although this is pretty unlikely to occur in this post-PPC world. But in the 
worst case it would require more memory during these kinds of peaks, and I 
think we're just going to have to accept that as a cost of doing business. For 
7 we'll probably need to demand 512MB as minimum, and all of the Macs we 
support are capable of that (even an old Outrigger beige Power Mac can hold a 
gig).

The theoretical maximum stack size is 4GB for a 32-bit process if we moved the 
stack all the way down to 0x00000000. However, we won't get anywhere near that 
because 4GB stack means no room for the code itself. In practice I don't think 
we'll be able to get much near a gig of stack without the linker objecting.

The real problem is that we have no way to detect that we are running out of 
stack until we actually do run out of stack, and this appears to be a flaw in 
the way the tracejit is designed. In fairness to Mozilla, it's probably never 
been an issue for them in practice, just us. We are more sensitive to this 
because we have two "problems": OS X requires a fixed stack cap -- it cannot 
dynamically grow above that designed cap -- and PPC has many more registers 
than, say, x86. OS X on Intel has the stack cap issue but it's only saving a 
handful of registers. We're saving 18 GPRs on each function call, plus the link 
register, FPRs, etc. This is demanded by the PPC ABI, and even provisionally 
reducing this number doesn't reduce our stack demands enough. SPARC has the 
same register problem, but Solaris stacks can grow. However, once we're into 
the native code the tracejit spits out, it's too late. PPC on Linux shouldn't 
have this problem either because the Linux stack can also grow. In short, it's 
unique to 10.4 and 10.5.

Original comment by classi...@floodgap.com on 25 Aug 2011 at 3:14

GoogleCodeExporter commented 9 years ago
s/designed cap/designated cap/

Original comment by classi...@floodgap.com on 25 Aug 2011 at 3:15

GoogleCodeExporter commented 9 years ago
Good. I thought the browser would always use +256 MB from start because I 
confused 'allocated' with 'fixed size'. I'm still experiencing 
(non-reproducable) crashes at Amazon Login and on Facebook every once in a 
while (maybe once in two weeks), so 256 MB will probably minimize these 
problems if they're stack overflow related. 

Original comment by chtru...@web.de on 25 Aug 2011 at 8:15

GoogleCodeExporter commented 9 years ago
Confirmed fixed in 7.0b1. Browser uses 336 MB of real memory to load the 
cringely-crasher local page… The first time I loaded the it, however, the 
browser crashed (but I haven't been able to reproduce it since), so maybe 256 
MB still isn't enough. 

Original comment by chtru...@web.de on 5 Sep 2011 at 11:03

GoogleCodeExporter commented 9 years ago
With 7.0b1, I can't make it crash anymore, with either the cringely-crasher 
local page or with cringely.com itself. Thank you.

Original comment by barr...@gmail.com on 5 Sep 2011 at 2:37

GoogleCodeExporter commented 9 years ago
Excellent.

Original comment by classi...@floodgap.com on 5 Sep 2011 at 2:38

GoogleCodeExporter commented 9 years ago
With regard to comment 24 let's see how it goes. I don't want to suck up our 
addressing space indiscriminately,

Original comment by classi...@floodgap.com on 5 Sep 2011 at 2:39

GoogleCodeExporter commented 9 years ago
Saving all those registers does not only consume stack space but does also hit 
performance.

Currently 18 registers are saved and restored with each fuction call.

One should investigate in how many of those are really ever used - i.e. by 
adding simple statistics to the findReg functions (setting a flag bit for each 
register used, and printing a log message when a register is used for the first 
time) and run dromaeo, for example.

Based on the results we might limit the number of registers that are saved and 
restored and hence are allowed to use in nanojit. Might be we only need half of 
the registers we currently save and restore?!

Original comment by Tobias.N...@gmail.com on 12 Sep 2011 at 4:10

GoogleCodeExporter commented 9 years ago
The tracer actually already does this, albeit indirectly -- a number of 
activation slots are identified, and this can be as few or as many as are in 
flight (see genPrologue in NativePPC.cpp). The problem is that we can stack up 
a lot at peak, and we also have to account for the linkage area.

Original comment by classi...@floodgap.com on 12 Sep 2011 at 6:50

GoogleCodeExporter commented 9 years ago
So what I thought of (saving the GPRs 13-31 if actually used) is done by 
evictScratchRegsExcept() which is called by asm_call() if I got it right?

I hope that actually works - did you check that it indeed does?

Original comment by Tobias.N...@gmail.com on 12 Sep 2011 at 9:26

GoogleCodeExporter commented 9 years ago
I don't follow your question, but evictScratchRegsExcept() doesn't do that; 
it's called *after* a function call is made (the callee should have saved the 
registers already) to indicate registers in flight are stale. Remember, the 
nanojit emits from the top down, so operations are actually represented 
backwards -- the evicter emits code which runs *after* the bctrl in execution. 
Register saving is generally done in the prologue, which in the nanojit is 
indirectly triggered by LIR_spill.

Original comment by classi...@floodgap.com on 13 Sep 2011 at 5:41