keeleysam / tenfourfox

Automatically exported from code.google.com/p/tenfourfox
0 stars 0 forks source link

increase recursion limit in PCRE for PPC [ghostery] #67

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi! I'm one of the developers of the Ghostery addon available on AMO here: 
http://mygho.st/amo

Some of the users we have reported that TenFourFox 4+ and Ghostery are not 
compatible.  There seems to be a regex issue, here is the exact error:

"InternalError: regular expression too complex on:233|(new 
InternalError("regular expression too complex", 
"chrome://ghostery/content/ghostery-db.js", 233))"

I haven't found the exact regular expression causing this, but I'd be willing 
to help here by providing a script that would walk the database and pinpoint 
the exact culprit.  I'll also provide any other support from the Ghsotery side 
as needed.  Thanks!

Original issue reported on code.google.com by felix.sh...@gmail.com on 7 Jun 2011 at 3:55

GoogleCodeExporter commented 9 years ago
Can you give me the regex in question? Is it always that line?

Original comment by classi...@floodgap.com on 7 Jun 2011 at 4:08

GoogleCodeExporter commented 9 years ago
Hi!  Thanks for responding. 

Its always that line since this is where all regexes are combined for the use 
in the matcher.  That said, its based on the entire tracker list.  So, I 
created a little script that could be pasted into Firebug to test which 
particular regex screws up. If this doesn't produce an error, I'll then create 
a script that replicates what happens in ghostery-db.js (combinatory regex 
based on appending all regex capable tracker patterns).  Please test and let me 
know.  Thanks!

Original comment by felix.sh...@gmail.com on 7 Jun 2011 at 4:19

Attachments:

GoogleCodeExporter commented 9 years ago
I ran the script and it seems to pass with both JIT on and off. Is this 
possibly configuration dependent?

Original comment by classi...@floodgap.com on 7 Jun 2011 at 4:27

GoogleCodeExporter commented 9 years ago
Ok, I was afraid of that, as far as I know it is not configuration dependent.  
Attached is the full replica of the line where it breaks.

The logic there is as follows: check all patterns that use RegEx patterns (?.*| 
and so on), of the ones that match, push them into an array and then build a 
really long regex based on that.  New tester is included that replicates this 
behaviour.

Again, thanks for looking into this.

Original comment by felix.sh...@gmail.com on 7 Jun 2011 at 4:44

Attachments:

GoogleCodeExporter commented 9 years ago
% src/mozilla-beta/obj-ff-dbg/dist/TenFourFox.app/Contents/MacOS/js ~/tester2.js
Pattern breaks here: 
(pub\.lookery\.com\/js\/|lookery\.com\/look\.js|\/j\/pub\/look\.js)|google-analy
tics\.com\/(urchin\.js|ga\.js)|(\.quantserve\.com\/|\/quant\.js)|sitemeter\.com\
/(js\/counter\.js|meter\.asp)|(\.1[12]2\.2o7\.net\/|hitbox\.com|\.omtrdc\.net\/|
\/s(c)?_code[0-9a-zA-Z_-]*(\.[0-9a-zA-Z_-]*)?\.js|\/(omniture|mbox|hbx|omniunih)
(.*)?\.js|common\.onset\.freedom\.com\/fi\/analytics\/cms\/)|(shots\.snap\.com\/
snap_shots\.js|spa\.snap\.com\/snap_preview_anywhere\.js)|((www|secure)\.statcou
nter\.com\/counter\/(counter[0-9a-zA-Z_]*|frames)\.js|c\.statcounter\.com\/)|\/m
int\/\?js|(stats\.wordpress\.com\/|s\.stats\.wordpress\.com\/w\.js)|(\.analytics
\.yahoo\.com\/indextools\.js|ystat\.js|\.yimg\.com\/.*\/ywa\.js|\.visualrevenue\
.com\/vr\.js)|static\.crowdscience\.com\/start(-.*)?\.js|(static\.fmpub\.net\/(z
one|site)\/|dynamic\.fmpub\.net\/adserver\/|thirdparty\.fmpub\.net\/placement\/)
|((bid|d[0-9]?)\.openx\.(org|net)\/|\/(adg|adx)\.js|\/(afr|ajs|avw)\.php)|(wms\.
assoc-amazon\.com|rcm\.amazon\.com\/e\/cm|www\.assoc-amazon\.(com|ca|co\.uk|de|j
p)\/(e\/ir|s\/(ads\.js|asw\.js|link-enhancer)))|(feeds\.feedburner\.com\/~[fs]\/
|feedproxy\.google\.com\/~fc\/)|(feedjit\.com\/serve\/|feedjit\.com\/map\/)|(\.g
ooglesyndication\.com\/pagead\/|googletagservices\.com\/tag\/js\/gpt\.js|partner
\.googleadservices\.com\/gampad\/|feedads\.g\.doubleclick\.net\/~at)|\/woopra(\.
v(2|3|4))?\.js|(ad(-apac)?\.([a-z][a-z]\.)?doubleclick\.net\/|\.doubleclick\.net
\/pagead\/)|(Tacoda_AMS_DDC_Header\.js|an\.tacoda\.net\/|an\.secure\.tacoda\.net
\/)|(ad\.yieldmanager\.com\/|optimizedby\.rmxads\.com|e\.yieldmanager\.net\/scri
pt\.js)|(content\.dl-rms\.com\/|\.dlqm\.net\/|\.questionmarket\.com\/)|(tracking
Tags_v1\.1\.js|\/webtrends(.*)?\.js|\.webtrendslive\.com)|(\.xiti\.com\/(hit\.xi
ti|get\.at)|js_xiti\.js|xtcore.js)|(\/share-this\.php?akst_action|w\.sharethis\.
com\/)|(\/seesmic_topposters_v2\.js|seesmic-wp\.js)|static\.addtoany\.com\/menu\
/(feed|page)\.js|(\/addthis_widget\.(js|php)|\.addthis\.com\/js\/widget\.(js|php
)|l\.addthiscdn\.com)|(\.revsci\.net\/|wunderloop\.net\/|ad\.targetingmarketplac
e\.com\/|revsci\.(.*)\/gw\.js)|(ads|clients|container)\.pointroll\.com|(static\.
getclicky\.com\/|clicky\.js)|(pixel|optimized-by|tap-cdn)\.rubiconproject\.com\/
|\.(sphere|surphace)\.com\/widgets\/sphereit\/js|(widget\.criteo\.com|dis(.*)?\.
criteo\.com)|(social\.bidsystem\.com\/|cubics\.com\/displayAd\.js)|(ads(1)?\.msn
\.com\/library\/dap\.js|adsyndication\.msn\.com\/delivery\/getads\.js)|(\.atdmt\
.com\/|\.adbureau\.net\/)|www\.google\.com\/uds\/api?file|(\.adultadworld\.com\/
geopop\/geoinject\.js|\.adultadworld\.com\/jsc\/)|ads\.(lzjl|clicksor)\.com|(js\
.adsonar\.com\/js\/|ads\.adsonar\.com\/adserving\/)|(\.tribalfusion\.com\/|tags\
.expo9\.exponential\.com\/tags\/)|(o\.aolcdn\.com\/ads\/adswrapper|o\.aolcdn\.co
m\/js\/mg2\.js|(r1\.ace|ace-tag|servedby|uac)\.advertising\.com|(at|pr|ar)\.atwo
la\.com\/)|digg\.com\/[0-9a-zA-Z]*\/diggthis\.js|\.overture\.com\/(partner\/)?js
|(www\.)?intensedebate\.com\/js\/|(facebook\.com\/connect|connect\.facebook\.net
|(static\.ak\.connect\.facebook\.com\/.*\.js\.php|fbconnect\.js))|(static\.btbuc
kets\.com\/bt\.js|\.n\.btbuckets\.com\/js)|(cdn|g2|gonzogrape)\.gumgum\.com\/jav
ascripts\/ggv2\.js|(baynote\.net|baynote(-observer)?([0-9]+)?\.js)|s3\.amazonaws
\.com\/getsatisfaction\.com\/(feedback\/feedback\.js|javascripts\/feedback-v2\.j
s)|(ad\.afy11\.net\/(srad|srwidget)\.js|beacon\.afy11\.net\/)|\/k_(push|button)\
.js|(tags\.bluekai\.com\/|bkrtx\.com\/js\/)|(tr-metrics\.loomia\.com|assets\.loo
mia\.com\/js\/)|othersonline\.com\/*\/[a-z0-9]+\.js|(\.dtmpub\.com\/|login\.doto
mi\.com\/ucm\/ucmcontroller)|counter\.hitslink\.com\/(track\.js|statistics\.asp)
|twitter\.com\/(javascripts\/[0-9a-z]+\.js|statuses\/user_timeline\/)|(\.mediapl
ex\.com|\.fastclick\.net)\/|(www\.haloscan\.com\/load\/|js-kit\.com\/[0-9a-z\/]+
\.js)|(\.burstbeacon\.com\/|\.burstnet\.com\/)|(bspixel\.bidsystem\.com\/|bidsys
tem\.adknowledge\.com\/)|(\.adrevolver\.com|ads\.bluelithium\.com)\/|\.sweepery\
.com\/javascripts\/*\/[0-9a-zA-Z_]*\.js|(cdn|taf)\.socialtwist\.com[:80]*\/*\/(s
cript|shoppr\.core)\.js|(tracking\.percentmobile\.com\/|\/percent_mobile\.js)|(a
pi|leads)\.demandbase\.com\/|(\/eluminate\.js|data\.cmcore\.com\/imp)|(now\.eloq
ua\.com|elqcfg[xml]*\.js|elqimg\.js)|(\.realmedia\.com\/|realmedia\/ads\/|\.247r
ealmedia\.com\/realmedia\/ads\/)|\.(scoreresearch|securestudies|scorecardresearc
h)\.com\/|(\.bizographics\.com\/|ad\.bizo\.com\/pixel)|(codice|codicebusiness)\.
shinystat\.(com|it)\/|(sniff|stats)\.visistat\.com\/|(\.tynt\.com\/ti\.js|\.tynt
\.com\/javascripts\/tracer\.js)|\.yandex\.ru\/(resource|metrika)\/watch\.js|\.sp
ylog\.(com|ru)\/|(spruce\.rapleaf\.com|\.rlcdn\.com)|(atrk|certify)\.alexametric
s\.com\/|cdn\.doubleverify\.com\/[0-9a-zA-Z_-]*\.js|(ipinvite|group11|4qinvite\.
4q)\.iperceptions\.com\/|(tweetmeme\.com\/i\/scripts\/button\.js|zulu\.tweetmeme
\.com\/button_ajax\.js)|(media|recs).richrelevance\.com\/|(\.unica\.com\/|ntpage
tag)|(cid|pi)\.pardot\.com\/|js\.stormiq\.com/sid[0-9]*_[0-9]*\.[0-9]*\.js|(\.hi
stats\.com\/js[0-9][0-9]?(.*)?\.js|s[1-9][1-9]?\.histats\.com\/stats\/)|(widgets
\.amung\.us\/.*\.js|whos\.amung\.us\/widget\/)|c\.compete\.com\/bootstrap\/.*\/b
ootstrap\.js|(s3\.amazonaws\.com\/wingify\/vis_opt\.js|dev\.visualwebsiteoptimiz
er\.com\/deploy\/js_visitor_settings\.php.*|server\.wingify\.com\/app\/js\/code\
/wg_consolidated\.js)|mashlogic\.com\/(loader\.min\.js|brands\/embed\/)|(c1|beta
)\.web-visor\.com\/c\.js|\.eproof\.com\/js\/.*\.js|(autocontext|o)\.begun\.ru\/|
foresee-(trigger(.*)?|alive|analytics(.*)?)\.js|3dstats\.com\/cgi-bin\/3dstrack(
ssl)?\.cgi|(\.webtrekk\.net|\/webtrekk\.js)|(cts(-secure)?\.channelintelligence\
.com\/[0-9]*_landing\.js|cts-log\.channelintelligence\.com\/)|(js|tag)\.admeld\.
com|cdn\.wibiya\.com\/(toolbars|loaders)|\/adam\/(cm8[0-9a-z_]+\.js|detect)|(\.r
es-x\.com\/ws\/r2\/resonance|\/resxclsa\.js)|(\/gomez.+?\.js|\.[rt]\.axf8\.net\/
)|(cdn\.mercent\.com\/js\/tracker\.js|link\.mercent\.com\/)|(\.content\.ru4\.com
\/images\/|\.edge\.ru4\.com\/smartserve\/|\.xp1\.ru4\.com\/|ad\.xplusone\.com\/|
\/xplus1\/xp1\.js)|(xslt\.alexa\.com\/site_stats\/js\/s\/|widgets\.alexa\.com\/t
raffic\/javascript\/)|tags\.mediaforge\.com\/if\/[0-9]+|((targeting\.api|widgets
-ak)\.visualdna\.com|\.visualdna\.com\/(live|js)|\.visualdna-stats\.com)\/|(es|b
y\.essl)\.optimost.com\/(es\/[0-9]+\/c\/[0-9]+\/u\/[a-zA-Z_]+\.js|(by\/)?trial\/
)|\/(html|image|js)\.ng\/|(bh|tag|ds)\.contextweb\.com|(adstat\.4u\.pl\/s\.js|st
at\.4u\.pl\/cgi-bin\/)|z(i|a)g\.(js|gif)|bs\.serving-sys\.com\/burstingpipe\/(ac
tivityserver|adserver)\.bs|assets\.newsinc\.com\/(ndn\.2\.js|analyticsprovider\.
svc\/)|\.predictad\.com\/scripts\/(molosky|publishers)\/|(track|files)\.netshelt
er\.net|cdn(s?)\.(brcdn|brsrvr)\.com\/v1\/br-trk\.js|(beacon|js)\.clickequations
\.net|\.amgdgt\.com\/(ads|base)\/|(ads|image2)\.pubmatic\.com\/adserver\/|adelix
ir\.com\/(webpages\/scripts\/ne_roi_tracking\.js|neroitrack)|(tns-counter\.ru|tn
s-counter\.js|\.tns-cs\.net|statistik-gallup\.net)|[a|c]\.adroll\.com|newstogram
\.com\/(.*)\/(histogram|toolbar)\.js|(rotator\.adjuggler\.com|\/banners\/ajtg\.j
s|\/servlet\/ajrotator\/)|(m|js|api|cdn)\.viglink\.com|(cdn|crosspixel)\.demdex\
.net|adserver\.(adtechus\.com|adtech\.de)|(dw|adlog)\.com\.com\/|(\.dt07\.net|mg
\.dt00\.net\/(u)?js)|(publishers|\.hat)\.halogennetwork\.com\/|s0b?\.bluestreak\
.com\/ix\.e|(cdn\.nprove\.com\/npcore\.js|go\.cpmadvisors\.com\/)|(tracking\.int
ermundomedia\.com|ad\.intermundonet\.com|media\.intermundomedia\.com\/([ij]pixel
|trutags|ping))|vtracker\.com\/(counter|stats|tr\.x|ts|tss|mvlive|digits)|(scrip
ts|tm)\.verticalacuity\.com\/vat\/mon\/vt\.js|(impression|ca)\.clickinc\.com\/|(
\.skimresources\.com\/js|\.skimlinks\.com\/(api|js)\/)|(clickserve\.cc-dt\.com\/
link\/|gan\.doubleclick\.net\/gan)|\.fwmrm\.net\/(ad|g)\/|pmetrics\.performancin
g\.com\/(js|in\.php|[0-9]*\.js)|(tracking\.conversionlab\.it|conversionlab\.trac
kset\.com\/track\/)|(cdn\.adblade\.com\/js\/|(web|pixel)\.adblade\.com\/imps\.ph
p)|tag\.didit\.com\/(didit|js)\/|(content|track)\.pulse360\.com\/|(ad(network)?|
click)\.linksynergy\.com|(adserver|int)\.teracent\.net\/|projectwonderful\.com/(
\ad_display\.js|gen\.php)|voicefive\.com\/.*\.pli|(econda.*\.js|www\.econda-moni
tor\.de\/els\/logging)|adspeed\.(com|net)\/ad\.php|(live1\.netupdater\.info\/liv
e\.php|netupdater[0-9]\.de|\/netupdater(_live)?)|\.ic-live\.com\/(goat\.php|[0-9
][0-9][0-9][0-9]\.js)|(adcode|conv)\.adengage\.com|sitecompass\.com\/(sc_cap\/|[
ij]pixel)|upsellit\.com\/(upsellit(.*)?\.jsp|custom\/)|(haku|puma|cheetah)\.vizu
\.com|(goku\.brightcove\.com|admin\.brightcove\.com\/js)|((ad2|counter)\.rambler
\.ru|\.mail\.ru\/counter|\.list\.ru\/counter)|(monitus(_tools)?\.js|(l|b)ive\.mo
nitus\.net)|service\.optify\.net\/(visit/ping|opt\.js)|(servedby|geo)\.precision
click\.com\/(ad|midas)|radar\.cedexis\.(com|net)|(webtraxs\.js|webtraxs\.com\/(t
rxscript|webtraxs)\.php)|vertster.com\/.*\/vswap\.js|eyewonder\.com\/.*\/wrapper
\.js|ad(media)?\.wsod\.com|fx\.gtop(stats\.com|\.ro)\/js\/gtop\.js|adsfac\.(eu|u
s|sg|net)|pixazza\.com\/(static\/)?widget|(qnsr\.com|thecounter\.com\/id)|(adviv
a\.net|(smp|leads)\.specificmedia\.com)|(mmcore\.js|cg-global\.maxymiser\.com)|u
ptrends\.com\/(aspx\/uptime\.aspx|images\/uptrends\.gif)|(facebook\.com\/(plugin
s|widgets)\/(like|likebox|activity|recommendations|facepile)\.php|fbcdn\.net\/co
nnect\.php\/js\/fb\.share)|((conversions|c|p)\.chango\.(ca|com)|(as|adunit)?\.ch
ango\.(ca|com)\/links)|(dd|ff)\.connextra\.com|(api|apps)\.conduit\.com|creative
by2\.unicast\.com\/(assets|script2)|(hs|sw)\.interpolls\.com\/(imprimage\.poll?|
inter|cache|creative|redir)|(this\.content\.served\.by|media2)\.adshuffle\.com\/
(p\/|asrefinc11\.js)|(servedby|stat|cdn|a)\.flashtalking\.com\/((tags|report)v[0
-9]|imp|[0-9][0-9][0-9][0-9])\/|(servedby|ads|event)\.adxpose\.com\/(event\.flow
?|ads\/ads\.js?uid=|adxpose\/find_ad\.js)|(us\.img|ads)\.e-planning\.net|p\.bril
ig\.com\/contact\/bct?pid|xcdn\.xgraph\.net\/([0-9]|partner\.js)|everestjs\.net|
pixel([0-9]*)?\.everesttech\.net|(cdn\.(royale|statistics\.live)|analytics)\.spo
ngecell\.com\/|((p(shared|files|thumbnails))|embed)\.5min\.com\/|(adserver1(-ima
ges)?|bullseye)\.backbeatmedia\.com|(amconf|core|adcontent)\.videoegg\.com\/(sit
econf|eap|alternates|ads)\/|(a\.ucoz\.net|(do\.am|at\.ua)\/stat\/|ucoz\.(.*)\/(s
tat\/|main/?a=ustat))|(ads|sync)\.(adaptv|tidaltv)\.(tv|com)|domdex\.(net|com\/f
?c)|mi\.adinterax\.com\/(js|customer)|(gopjn|pjatr|pjtra|pntra|pntrac|pntrs)\.co
m\/|ads2?\.smowtion\.com

I have to admit, that's rather a large pattern. Is this what's running normally?

Original comment by classi...@floodgap.com on 7 Jun 2011 at 5:32

GoogleCodeExporter commented 9 years ago
Hi! Yes, this is indeed the regex that gets built and used for mass matching.  
The size of it is also dependent on the users selections: the example above is 
when everything is selected to be blocked.  This works well and fine on all 
other versions of Firefox.  This list is bound to be smaller with next few 
releases of the addon, but the question is why this breaks TenFourFox in the 
first place and if this could be addressed at all. Thanks!

Original comment by felix.sh...@gmail.com on 7 Jun 2011 at 3:24

GoogleCodeExporter commented 9 years ago
The triggered error from YARR/PCRE is error 17, corresponding to an overflow in 
recursion. In pcre/pcre_compile.cpp,

unsigned enclosedBrackets = (*brackets - bracketsBeforeRecursion);
unsigned limitBracket = minBracket + enclosedBrackets + (bravalue > OP_BRA);
if (!((minBracket & 0xff) == minBracket && (limitBracket & 0xff) == 
limitBracket)) {
     *errorCodePtr = ERR17; // << ERROR
     return false;
}
JS_ASSERT(minBracket <= limitBracket);

The cap on limitBracket is 256. It is unclear why PPC increases so quickly; 
changing the bitmasks to 0x1ff (doubling the recursion limit, effectively) 
repairs the issue. However, this seems too risky for 4.0.3. I will put this 
adjustment into 5 beta 2.

Original comment by classi...@floodgap.com on 7 Jun 2011 at 7:15

GoogleCodeExporter commented 9 years ago
(For the record, I still consider the pattern pathologic, which is why I'm 
leery of this change in a stable branch.)

Original comment by classi...@floodgap.com on 7 Jun 2011 at 7:16

GoogleCodeExporter commented 9 years ago
Haha, well, thanks for the research and the preliminary issue resolution.

As far as pattern being huge -- I wholeheartedly agree, but, if you care about 
a piece of development history of Ghostery, originally, this was executed as an 
array iteration (thus the array structure), one at a time.  This proved to be 
too slow for when a user chooses to block all trackers, common complaint was 
that Ghostery poped up "Warning Unresponsive script" too often.  What you see 
now is an alternative -- believe me its a lot faster on all non-104 
implementations.

Original comment by felix.sh...@gmail.com on 7 Jun 2011 at 7:38

GoogleCodeExporter commented 9 years ago
5b2 is now available with this change.

Original comment by classi...@floodgap.com on 10 Jun 2011 at 4:09

GoogleCodeExporter commented 9 years ago
Beta users are reporting that this fixes the problem for them, so I will close 
this issue. Please post here if the issue recurs and I will re-open it if 
needed.

Original comment by classi...@floodgap.com on 10 Jun 2011 at 12:28

GoogleCodeExporter commented 9 years ago
Tracking a crash possibly related to this in issue 69.

Original comment by classi...@floodgap.com on 11 Jun 2011 at 2:46

GoogleCodeExporter commented 9 years ago
A possibly related issue is issue 77. It's unclear what the proper number 
should be.

Original comment by classi...@floodgap.com on 13 Jul 2011 at 5:38

GoogleCodeExporter commented 9 years ago
just did a fresh install of windows xp pro and firefox as well as it's add-on 
etc...
Now each time i go to a different site, there is a delay as well as a new 
pop-up screen asking if i want to run or stop the script 
Script: chrome://ghostery/content/ghostery-common.js:1206

This is just a pain in the neck !!! i am thinking of removing ghosty and then 
seeing if that makes any difference, please advice ????

Original comment by benniede...@gmail.com on 23 Mar 2013 at 2:59

GoogleCodeExporter commented 9 years ago
This is not Ghostery's site. Don't file bugs about it here.

Original comment by classi...@floodgap.com on 23 Mar 2013 at 3:03