mapbox / mapbox-gl-js

Interactive, thoroughly customizable maps in the browser, powered by vector tiles and WebGL
https://docs.mapbox.com/mapbox-gl-js/
Other
11.2k stars 2.22k forks source link

iOS crash when using extrusions #4695

Closed bearman1 closed 4 years ago

bearman1 commented 7 years ago

Hi,

There appears to be an issue when using mapbox extrusions which cause the browser to crash on iOS mobile devices. After a short period (generally not more than a couple of minutes) of navigating around the map, the browser reports 'A problem occurred with this webpage so it was reloaded'. The issue occurs in both Safari and Chrome. I have replicated the issue on a number of iPads and iPhones including an iPhone 7 running the latest iOS version. I have found that even when using the sample code from the mapbox site, the issues occurs: https://www.mapbox.com/mapbox-gl-js/example/3d-buildings/

The issue was not found to exist on Android devices. I have not tested on a Mac.

Below are a couple of iPAD crash logfiles that may or may not be useful: https://github.com/bearman1/test1/blob/master/JetsamEvent-2017-05-03-084125.ips https://github.com/bearman1/test1/blob/master/JetsamEvent-2017-05-03-084135.ips

They list the 'reason' being 'vm-thrashing' and 'vm-pageshortage'.

An available site I was able to replicate the issue was: http://maps.nicholsonroad.com/heights/

Can anybody else confirm they are able to replicate this issue and it is indeed a bug?

Thanks. Ash.

kristfal commented 7 years ago

'vm-pageshortage' and 'vm-thrashing' are an indications of iOS running out of memory and killing the wkwebview to keep system performance at an acceptable level.

I don't think building extrusions themselves are responsible for this, as we're seeing quite a lot of memory related crashes on iOS devices even without extrusions, but they probably make memory pressure worse.

We're consistently able to reproduce crashes by loading and panning quickly over quite heavy tiles (around 500k ones). Reducing the panning speed (thus loading and unloading tiles slower) delays and even prevents crashes if the speed is slowed down enough. The most sensitive devices to memory pressure are iPads, so they make great test devices.

Our current guess is that the Safari Mobile garbage collector has issues when garbage is built quickly. If a lot of memory is allocated just before a major garbage collection was scheduled, iOS will dispatch a memory warning and crash the wkwebview before the garbage collection runs. This scenario can typically happen when a user zooms on an iPad, which would start loading up to 12 tiles at once during a zoom operation.

Possible mitigation strategies are:

– Reduce tile data sizes: Simplify geometry and remove any data you don't use in your style – Reduce number of tiles loaded, for instance don't include tiles beyond z13 – Reduce number of layers in style and complexity in filter functions

Possible solutions are:

– General memory management improvements – Reduced cache size for iOS devices / make cache size adjustable – Use a static memory object pool for objects in the hot loop

stevage commented 7 years ago

Thanks very much for the info @kristfal. I wonder if another mitigation strategy would be to prevent fast panning somehow, or slow the data requests when there are already outstanding renders.

zspitzer commented 7 years ago

@bearman1 I can also reproduce this bug on my iPad, please file a bug against https://bugs.webkit.org/enter_bug.cgi

kristfal commented 7 years ago

I've been digging deeper into this, and it seems like there is a memory leak in Safari on both mobile and desktop for mapbox-gl-js.

All mobile debugging have been done with the basic mapbox example on an iPad Pro 6,3 (2 GB ram) running iOS 10.3.1. Desktop debugging was done on OS X Safari 10.1.

Couple of gotchas for mobile testing:

  1. Total Mobile Safari memory consumption have been measured through the Instruments Activity Monitor in Xcode.
  2. Page memory consumption have been measured with Safari Inspector.
  3. com.apple.WebKit (The webkit process) doesn't release the memory for a tab properly until the process have been fully killed. Reloading the tab doesn't seem to release memory to system.
  4. iOS crashes the webkit process once 'Physical Memory Free' gets close to 0, which also crashes the Safari Inspector 🎉

Testing practice

The mobile tests were conducted in 3 stages:

Stage 1: Pan within an area of already loaded tiles by user interaction (roughly 8 tiles loaded) Stage 2: Zoom within an area of already loaded tiles by user interaction (roughly 20 tiles loaded) Stage 3: Zoom / pan to large cities where tiles are not loaded (keep loading new tiles till crash)

instruments 1-2-3

As seen from the Instruments log:

During Stage 1 and Stage 2, memory consumption is stable. During Stage 3, free memory continues to drop until it eventually crashes after around 1 minute. Memory consumption increases steadily with no indication of any significant memory being released (the small increase of free memory at the end of the recording is probably memory released by other apps / OS memory).

As a reference check, I profiled another Web GL heavy web-app that doesn't use mapbox-gl-js. It is using three.js for rendering 3D data, and data is delivered as PBF as well.

instruments-other-webapp

As seen from the Instruments log:

Memory seems to be releasing during prolonged usage and the app doesn't crash when used for the same period as mapbox-gl-js. The large garbage collection seen as a spike in "Physical Memory Free" releases around 170mb (from 33mb free to 197mb).

This test was conducted by continuously requesting and rendering new data (similar to Stage 3 testing of mapbox-gl-js). It should be noted that this app loaded around 4x the amount of pbf data as the mapbox-gl-js test over the same duration.

Safari Inspector data for Stage 3 testing

skjermbilde 2017-05-31 kl 12 05 37

Looking at the Safari Inspector, you'll notice that the majority of memory retained is for 'Page' (Side). According to the Safari Memory Debugging Blog, 'Page' memory is 'All other memory. This includes engine memory related to the DOM, styles, rendering data, memory caches, system allocations, etc.'. My hunch is that this also includes GL buffers.

You'll also notice the JS heap size gradually growing, with minor spikes both ways. Overall the heap size grows without being reduced. Here is an additional run:

skjermbilde 2017-05-31 kl 14 50 32

Looking at the heap snapshots

Heap snapshots are taken by the inspector automatically every 10 seconds. These snapshots are from the run shown above with retained size and total count in parenthesis:

Snapshot 1: Objects: 41,8 mb (12831) ArrayBuffer: 24,65 mb (15)

Snapshot 6: Objects: 61,03 mb (16858) ArrayBuffer: 29,28 mb (79)

Snapshot 12: Objects: 73,99 mb (23620) ArrayBuffer: 38,14 mb (165)

Snapshot 16: Objects: 111,00 mb (65711) ArrayBuffer: 54,04mb (743)

The retained size for objects did reduce slightly between some snapshots, but the total count for both Objects and ArrayBuffers always increased between snapshots.

The same behavior is present on Safari 10.1 for OS X:

skjermbilde 2017-05-31 kl 15 26 09

Similar to Mobile Safari, Object and ArrayBuffer allocation just keeps growing in desktop Safari.

A similar run with Chrome (on desktop) shows no sign of increasing heap sizes nor any signs of growing number of objects and ArrayBuffers.

I also did a test on Safari desktop without running the Inspector timeline recording to check if the tool itself was the cause of memory/object retention. The test replicated the user input from previous tests for 2 minutes and then used console.takeHeapSnapshot() to capture the heap.

Snapshot 1: Objects: 100,98 mb (53845) ArrayBuffer: 15,72 mb (426)

The results were pretty much inline with previous tests, so I doubt the Inspector timeline causes retention. It does however make Mobile Safari more crash prone while running.

The bottom line is that there seems to be a memory leak in mapbox-gl-js for Safari. I haven't found any good documentation for how to identify the root cause of the leak using the Safari Inspector nor why Safari just keeps producing a ton of Objects and ArrayBuffers that doesn't have an external reference but never gets garbage collected properly.

anandthakker commented 7 years ago

Wow, thank you @kristfal -- this is some wonderfully thorough profiling research.

zspitzer commented 7 years ago

I have filed a new bug https://bugs.webkit.org/show_bug.cgi?id=172790 referencing this analysis from @kristfal

zspitzer commented 7 years ago

Update from Apple: "There does look to be a WebGL-related leak here. We'd have to check whether the page is entraining the data"

ghost commented 7 years ago

Yes this memory blow out is killing us.

kristfal commented 7 years ago

@anandthakker I've been digging a bit further into this, and got some findings but still no definitive answer for why this happens or how to fix it. I've set up a small automated test that does the following:

a) Pan north for roughly 1 min at z11 through central Europe c) Pan west for 10 seconds b) Pan south for roughly 1 min d) Repeat

That test is designed to keep loading new tiles indefinitely, or until the browser crashes. I've been benchmarking mainly on an iPad Air 2 with 2 GB RAM.

Connecting the Safari Timeline Inspector makes the test crash a lot quicker, so I've been using 'Xcode Instruments' to measure total memory consumption and taken a few snapshots with the Safari Inspector when memory usage climbs. I've got the following "new" findings:

1) Safari desktop's memory behavior is similar to that of iOS. Both are leaky, but iOS "resolves" the problem with crashing. Desktop eventually starts running very slowly and run slowly with large 1 sec GC pauses. If you don't have an iPad to work with and want to profile, using Safari for desktop is an OK substitute.

2) Leaks are present even when just rendering a single layer from a single source. Tested all layer types on a lightweight source: In general, fewer layers causes slower leaks – but every test I've done has always eventually crashed on the iPad Air 2. Leak rate also seems to be proportional to viewport size, which is why iPads gets hit hard.

3) Safari is retaining a lot of ArrayBuffers and objects in gl-js. I've seen 250k+ retained objects and 15k+ ArrayBuffers in the inspector. However, JS heap seems to only account for 10-15% of retained memory, and 'Page' memory is the real memory eater. I have reached out to a few webkit devs and ask how we can profile 'Page' memory, but haven't received any replies yet. Can't find anything online about it either.

I did a similar profile of Tangram as that library shares a lot of fundamental concepts with gl-js.

Tangram profiles very differently than gl-js, and while I can still make it crash on an iPad Air 2 with the same test scenario as described above, the crash rate is a lot lower. As with gl-js, when the Safari Inspector is connected, Tangram crashes more frequently. A few notes:

1) Tangram has a considerably smaller heap size, which Safari seems to be able to GC better than Mapbox's JS heap. I see Tangram heap snapshots grow to 30 MB, and then drop down to 15 after a GC. The gl-js heap pretty much keeps growing with only minor reductions between snapshots.

2) Tangram produces a lot less retained objects in Safari. I don't know why. They pool Vertex ArrayBuffers, which seems to be working pretty well as AB retained size is not as large as with gl-js.

3) Similarly to gl-js, tangram's 'Page' memory keeps growing until it eventually crashes on iOS devices, but the growth rate is a lot slower than gl-js.

With that in mind, I think the bottom line is still that Webkit is leaking memory, and that the underlying issue should be reviewed by Apple. I can't find much documentation or guides on how to isolate the leaky parts, so right now – I'm unsure how to proceed with this issue.

anandthakker commented 7 years ago

Thanks @kristfal. I agree that we mostly have to rely on Apple to address the underlying issue in Safari, but great point about the large number of ArrayBuffers -- that's something we can take action on that could may mitigate the Safari issue and just improve performance in general. I'll file a separate ticket for it.

shi11 commented 7 years ago

just confirming that fill-extrusion seems to be the culprit. I'm running an ionic app. Even a few 3d polygons will cause the map to crash in a matter of minutes.

juhanivl commented 6 years ago

Any workarounds we could try to pursue? Any ideas what WebGL stuff is leaked?

We have a customer for whom iOS browsers crashing is a huge blocker. Typically within 10s of use the Safari crashes and as mentioned earlier in this thread the faster the interactions are the faster the crash will happen.

We don't use any extrusions, so the leak does happen with just "2D" maps and is also visible on desktop Safari which makes debugging a bit easier. Because our customers cannot cannot launch without finding a sustainable way to present maps on Safari we have started investigating this ourselves but haven’t gotten very far:

Basically we can see that if the map is destroyed, after a short while (1-3 seconds) the leaked "page" memory gets released. The only workaround we have found is to destroy and reinitialise the map periodically. We can schedule this destroy and reinitialise to happen at every “moveend” - event which does keep Safari from crashing. Of course this is a terrible workaround and causes a huge visible glitch as the map is reinitialising, but we cannot fix Safari and we aren't really looking to move away from Mapbox, so now we are spending time trying to hide the glitch at reinitialising.

Spending development time on this "workaround" and potentially going to production with it is not too appealing, so any ideas of how to prevent the leak, or how to more cleanly release the memory would be most appreciated. We would also be very grateful if anyone has any additional points which we could investigate or try.

zspitzer commented 6 years ago

As this is a Safari specific bug, please consider chiming on this bug report https://bugs.webkit.org/show_bug.cgi?id=172790 (it's stalled, sometimes it's important to nag)

zspitzer commented 6 years ago

PS @kristfal your demo for this bug isn't working anymore, it's getting an invalid token error

zspitzer commented 6 years ago

any chance of getting the test case working again @kristfal ?

https://bugs.webkit.org/show_bug.cgi?id=172790#c10

aaronmrosenthal commented 6 years ago

Following...

kristfal commented 6 years ago

Sorry for the slow reply here. Test case should work now. I'm on a free Mapbox plan, so may run out of tiles. The code is simple tho, so you can easily copy it and replace with your own access token.

AmiraMostafaAhmed commented 6 years ago

any updates ?

leearmstrong commented 6 years ago

Any update to this at all?

Plantain commented 6 years ago

Would this affect hillshading? Does that count as an extrusion? I'm seeing my site reliably crash a brand new iPad with hillshading enabled - works fine without. Will open a new bug if it's implemented differently.

femski commented 6 years ago

Yes, It affects hillshading too. My App reliably crashes fast with hillshading. And only slowly with hillshading off. So I have turned off Hillshading to delay the crash (which is inevitable).

GL-JS's issues with ios Safari are well known. This really is an issue with iOS Safari - since all demanding WebGL games (Unity etc.) crash on iOS Safari. I wont be counting on a fix anytime soon. Newer iOS devices have 3GB+ RAM yet web pages/WEbGl have ridiculous low limit. Apple does not care of WebGL or Web in general. They only care about AppStore/Native where they get a 30% cut.

leearmstrong commented 5 years ago

The WebKit bug can now be reproduced it seems. Not sure what that means though going forward.

There was some MapBox talk of reducing the amount of new ArrayBuffers used but not sure if anything has been done yet?

grorg commented 5 years ago

I'm having trouble reproducing this on various iPads running iOS 12 and above, using the test at: https://measuredweighed.github.io/mapboxgl-js-mem-usage/index.html

That seems to use a version of mapbox directly from mapbox.com. Has anything been changed that might avoid or fix this issue?

grorg commented 5 years ago

I can reproduce this on the hillshade example. I note that after a lot of zooming and panning, the JS heap snapshot shows a lot of un-collected ArrayBuffers. The biggest handful are all 4MB, and don't increase in number.

However there are thousands of smaller buffers, from approx 200k to 0 that are never garbage collected. They are all marked as root objects, in that they are hanging off the window.

Using the heap profiler/snapshot, it looks like the buffers come from:

and a few other places.

The same thing happens in Google Chrome - which suggests it's probably an issue in MapBox.

grorg commented 5 years ago

Yeah, it seems window.map has references to all the ArrayBuffers that have been used.

measuredweighed commented 5 years ago

A big thanks to @grorg for taking the time to dig into this. It looks like this may be related to the issue I raised as part of #7476. @ChrisLoer any idea when someone at Mapbox would be able to confirm this behaviour and investigate a fix? It's a showstopper for us and, judging by the Webkit ticket, a fair few others.

ChrisLoer commented 5 years ago

:wave: Thanks for the investigation and patience @grorg @measuredweighed and others. We are mostly off-site this week, but will try to find time to look into this further.

ChrisLoer commented 5 years ago

The hillshade leak looks like a pretty simple bug, I've filed it as https://github.com/mapbox/mapbox-gl-js/issues/7690 and we should have it fixed in the next release. Unfortunately I don't think it solves the original webkit-specific behavior that we've been having trouble nailing down.

ChrisLoer commented 5 years ago

I followed up a little on the "pool ArrayBuffer" idea @anandthakker suggested in https://github.com/mapbox/mapbox-gl-js/issues/4695#issuecomment-323351466 (and filed as issue #5168). I tried out a hacked-in buffer pool that was able to reduce the number of ArrayBuffer allocations by a factor of maybe 4-6x (https://github.com/mapbox/mapbox-gl-js/commit/1cf7e1e5cde8f4d660179a527b4706b6d32c1e1a).

It didn't seem to make a dramatic difference running @measuredweighed 's reproduction case in Safari 12.0.1 on macOS Mojave -- either way memory usage slowly drifts up as long as the animation is continuously running:

before:

screenshot 2018-12-13 14 29 02

after:

screenshot 2018-12-13 14 34 52

But... the ArrayBuffer pooling only applies to the intermediate ArrayBuffers that are being generated on the workers -- not to the ArrayBuffers that get transferred to the foreground. @kristfal noted that Tangram exhibits a similar leak, but apparently more slowly. FWIW, Tangram follows the same pattern we do of transferring ArrayBuffers from the worker to the foreground, but has many fewer (larger) buffers to transfer. Unfortunately, if transferring ArrayBuffers to the foreground does turn out to be the root cause, our architecture would make it pretty difficult to combine many layers into a single buffer the way Tangram does.

measuredweighed commented 5 years ago

Thanks for looking into this further @ChrisLoer! It appears Apple have now closed out the WebKit ticket as the hillshade leak was Mapbox-specific, but as you state: there appears to be a separate leak that is likely to originate in WebKit.

I can appreciate that the architectural change would be a daunting effort, so I suppose an alternative question is: how easy would it be to setup a test case to isolate/determine whether the transference of ArrayBuffers from worker => foreground is the issue? @grorg works on the WebKit team and was kind enough to look into the hillshade issue, so I think if we were able to create a reproducible case they'd take another look.

ChrisLoer commented 5 years ago

So, this is a little contrived, but I tried to make a case that mimicked/amplified the GL JS case by having four workers continually spam the foreground with ArrayBuffers that are just held briefly before being thrown away:

<button onclick="start()" id="start-button">Start</button>
<button onclick="stop()" id="stop-button">Stop</button>
<script>
var buffers;
var workers = [new Worker('worker.js'), new Worker('worker.js'), new Worker('worker.js'), new Worker('worker.js')];
workers.forEach(worker => worker.onmessage = function(message) {
    buffers = message.data.data;
});

var interval;
function start() {
    interval = setInterval(function(){
        workers.forEach(worker => worker.postMessage("foo"));
    }, 20);
}

function stop() {
    clearInterval(interval);
}
</script>

worker.js:

onmessage = function(e) {
    var buffers = [];
    for (let i = 0; i < 2500; i++) {
        buffers.push(new ArrayBuffer(40965));
    }
    self.postMessage({data: buffers}, buffers);
};

Running this on desktop Safari, memory keeps climbing until the browser kills the page and reloads. Running this on Chrome 71, memory usage jumps around a lot as the poor garbage collector works overtime, but it never goes over ~3GB and it spends most of the time under 1GB.

Using the heap profiler/snapshot, it looks like the buffers come from: GridIndex Placement.retainedQueryData Map.style image.data

@grorg I never responded to ☝️ this directly, but those are all data structures that we would expect to hold data transferred from the workers, and because (1) we have a lot of small buffers per-tile (it's proportional to the number of layers, and our maps have lots of layers), (2) we hold onto an in-memory cache of tiles outside the viewport -- it sounds pretty expected that we'd have thousands of them around at any given time. However, we wouldn't expect the number to be growing without bound.

They are all marked as root objects, in that they are hanging off the window.

I saw this in the Safari heap snapshots as well, and to be honest I don't know what it means. Certainly we're not creating these as root objects.

measuredweighed commented 5 years ago

For ease of access, I put a copy of @ChrisLoer's ArrayBuffer example online here @grorg.

leearmstrong commented 5 years ago

Has any progress been made on this issue at all?

zspitzer commented 5 years ago

that demo from @ChrisLoer also crashes Firefox Nightly, I found what I think is the related bug and have added a link back to this issue https://bugzilla.mozilla.org/show_bug.cgi?id=1407691

femski commented 5 years ago

I created a new bug for Webkit team since old bug was closed mistakenly and no one seems to be looking: https://bugs.webkit.org/show_bug.cgi?id=194268

grorg commented 5 years ago

Thanks for the update. We'll look into it.

femski commented 5 years ago

Great news! I can confirm that this is fixed in latest iOS safari 12.2. Thank you Dean!!

mourner commented 5 years ago

The desktop Safari Tech Preview also works great. Looks like we can close the issue now — it was a browser bug and it appears fixed there. Thanks to everyone who participated in the bug reports!

grorg commented 5 years ago

Phew! I'm glad it is fixed - sorry it took so long to verify. I spent all day yesterday trying to ensure WebKit/Safari really really really wasn't holding on to any ArrayBuffers.

measuredweighed commented 5 years ago

A big thank you to @grorg and @ChrisLoer for helping to get to the bottom of this.

femski commented 5 years ago

@grorg This bug is back in iOS 12.3 It was ok for few weeks (iOS 12.2) but Mapbox again reliablly crashes (fast) in iOS 12.3. Test case shows this very clearly.

I have reopened the webkit ticket also: https://bugs.webkit.org/show_bug.cgi?id=194268

asheemmamoowala commented 4 years ago

This has been resolved with the v1.5.1 release.

Webkit-based browsers should no longer crash from prolonged use, use of hillshade layers, or fill extrusion layers.

joewoodhouse commented 4 years ago

@grorg @ChrisLoer definitely not my area of expertise this, so wondering if you can help

I'm investigating https://github.com/mapbox/mapbox-gl-js/issues/9126#issuecomment-724945310 and what I'm seeing in my recreation is a large number of ArrayBuffers hanging around, which sounds very similar to this issue. I wonder if you could re-check that the Safari fix is still there in e.g. iOS 14.2? I ran your reproduction site and it "seemed" to be OK to me when I ran on desktop, but I can't be sure.

Edit: FYI added some analysis/graphs to https://github.com/mapbox/mapbox-gl-js/issues/9126#issuecomment-724983685