Closed junrushao closed 11 months ago
@junrushao how can we find tokens/sec? I'd say 'quite fast' fastest LLM I've run on this 2020 MacBook Pro M1 8G. 10x faster than your WebGPU demo running with less overall memory usage.
All reports out is the text?
We just added a new updates #14 which should ship to conda by now, you can type /stats
after a conversation to get the measured speed
Killer, I'm at encode: 31.9 tok/s, decode: 11.4 tok/s for 2020 MacBook Pro M1 8G with the default vicuna 6b. For reference my decode on the WebGPU demo is like, 0.5/sec.
OOM on gtx 1650. Load the model fine, but OOM when generate the first message
@nRuaif 4GB memory wouldn't be enough. A 6GB one should work
On iPhone 13, crashes after a few seconds of [System] Initialize...
. Phone has 4GB of RAM, which I presume is the cause.
@y-lee That's correct. The model we are using so far requires 6GB RAM to run smoothly
On the iPad Pro 11” with M1 I am getting decode of 10.6 tok/s (I have seen slightly higher and lower). It is running iPadOS 16.1.
encode: 39.5 tok/s, decode: 26.0 tok/s
on Windows 11 with RTX-3080
encode: 32.5 tok/s, decode: 17.1 tok/s
on Macbook Pro with M1Pro (16 GPUs) and macOS Ventura 13.3.1
Hardware/GPU | OS | Tokens/sec | Source | Model | Notes |
---|---|---|---|---|---|
RTX 3060 (12GB) | Debian bookworm | 21 | vicuna-v1-7b | 3644MiB GPU memory used |
/stats
after /reset
: encode: 72.2 tok/s, decode: 23.2 tok/s/stats
for 2nd and later messages: encode: 39.3 tok/s, decode: 21.3 tok/s
>>nvidia-smi --query-gpu=memory.used --format=csv
memory.used [MiB]
3644 MiB
On my M1 Max Mac Studio with 64GB of RAM:
encode: 53.7 tok/s, decode: 18.6 tok/s
On my MBP 2020 13-inch[intel CPU, 32G Ram, RX6800 16G VRAM], Ventura 13.3.1
encode: 46.4 tok/s decode: 22.5 tok/s
No sure if this is useful or if this is the right thread to post this in but I encountered this error on an old Laptop with a discrete very old Nvidia GPU (GT 920m) with the 470.182.03 driver which should include Vulcan:
MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Use lib /mnt/run/code/llma/mlc-ai/dist/lib/vicuna-v1-7b_vulkan_float16.so
Initializing the chat module...
[20:30:33] /home/runner/work/utils/utils/tvm/src/runtime/vulkan/vulkan_buffer.cc:61:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
Check failed: (__e == VK_SUCCESS) is false: Vulkan Error, code=-2: VK_ERROR_OUT_OF_DEVICE_MEMORY
Stack trace:
[bt] (0) /mnt/run/code/mambaforge/bin/../lib/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x27) [0x7f975d98ba37]
[bt] (1) /mnt/run/code/mambaforge/bin/../lib/libtvm_runtime.so(+0x3f375) [0x7f975d929375]
[bt] (2) /mnt/run/code/mambaforge/bin/../lib/libtvm_runtime.so(tvm::runtime::vulkan::VulkanBuffer::VulkanBuffer(tvm::runtime::vulkan::VulkanDevice const&, unsigned long, unsigned int, unsigned int)+0x220) [0x7f975da646b0]
[bt] (3) /mnt/run/code/mambaforge/bin/../lib/libtvm_runtime.so(tvm::runtime::vulkan::VulkanDeviceAPI::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)+0x4a) [0x7f975da7168a]
[bt] (4) /mnt/run/code/mambaforge/bin/../lib/libtvm_runtime.so(tvm::runtime::NDArray::Empty(tvm::runtime::ShapeTuple, DLDataType, DLDevice, tvm::runtime::Optional<tvm::runtime::String>)+0x1a7) [0x7f975d9a3037]
[bt] (5) /mnt/run/code/mambaforge/bin/../lib/libtvm_runtime.so(+0x121862) [0x7f975da0b862]
[bt] (6) /mnt/run/code/mambaforge/bin/../lib/libtvm_runtime.so(tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)>::AssignTypedLambda<void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)>(void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x204) [0x7f975da0f7e4]
[bt] (7) /mnt/run/code/mambaforge/bin/../lib/libmlc_llm.so(+0x1bdea6) [0x7f975dce3ea6]
[bt] (8) /mnt/run/code/mambaforge/bin/../lib/libmlc_llm.so(mlc::llm::CreateChatModule(tvm::runtime::Module, tvm::runtime::String const&, tvm::runtime::String const&, DLDevice)+0x411) [0x7f975dce4ba1]
@zifken looks like VK_ERROR_OUT_OF_DEVICE_MEMORY
indicates that it doesn't have enough memory. I looked it up and it seems that GT 920M only has 2GB RAM, but the default model is 2.9G in size :/
I see so only GPUs with more than 4go or vRAM are supported because of the size of the model (it makes sense) . I will try on an other GPU model shortly. Thank you for the feedback
@zifken there are some reports saying 4GB might work, but 6GB is recommended atm
On my MBP 2020 13-inch[intel CPU, 32G Ram, RX6800 16G VRAM], Ventura 13.3.1
encode: 46.4 tok/s decode: 22.5 tok/s
It's confusing, On my Win10: [AMD Ryzen 5 5600 6-Core Processor 3.50 GHz, 96G Ram, RTX 2080 Ti Modified to 22G VRAM], the stats is below:
encode: 24.0 tok/s, decode: 24.5 tok/s
iPad Pro 11 A12Z
encode: 5.1 tok/s, decode: 4.1 tok/s
Linux RTX 3090
encode: 179.0 tok/s, decode: 25.7 tok/s
(prompted with Implement binary search in Python please! Also try to use some type annotations!
)2021 MacBook Pro M1 Max [32 cores], 32 GB RAM, 1 TB SSD
Mac OS Monterey 12.5.1
encode: 69.4 tok/s, decode: 21.0 tok/s
Memory usage 3.64 GB
(Edit) Just saw that this score is higher than M1 Max 64 GB so I repeated the prompts and I'm still getting an average of > 20 tokens/second.
machine | OS | CPU | GPU | result | remarks |
---|---|---|---|---|---|
Yoga Slim 7 pro 14ARH7 | Fedora 37 | 6800HS (13.3GB) | 680M (2GB) | encode: 11.0 tok/s, decode: 3.8 tok/s |
battery |
Yoga Slim 7 pro 14ARH7 | Fedora 38 | 6800HS (13.3GB) | 680M (2GB) | encode: 16.6 tok/s, decode: 6.2 tok/s |
AC |
Yoga Slim 7 pro 14ARH7 | Windows 11 22H2 | 6800HS (13.3GB) | 680M (2GB) | encode: 6.7 tok/s, decode: 7.9 tok/s |
battery |
Yoga Slim 7 pro 14ARH7 | Windows 11 22H2 | 6800HS (13.3GB) | 680M (2GB) | encode: 16.6 tok/s, decode: 10.2 tok/s |
AC |
desktop | Windows 10 22H2 | 5900x (32GB) | 1660ti (6GB) | encode: 49.5 tok/s, decode: 23.9 tok/s |
- |
On 14" Macbook Pro (M2 Pro with 10-Core CPU and 16-Core GPU with 16GB Unified Memory) with macos Ventura 13.3.1
encode: 59.2 tok/s, decode: 22.5 tok/s
I am seeing encoding performance b/w 45-60 and decoding b/w 20-29.
GPU | OS | /stats |
---|---|---|
Radeon RX 470 (4G) | AlmaLinux 9.1 | encode: 14.3 tok/s, decode: 9.4 tok/s |
Encoding performance fluctuates between 5-45, decoding between 6-9.
OS: MacOS 13.3.1 (22E261) processor: 2.3 GHz Quad-Core Intel Core i7 graphics: Intel Iris Plus Graphics 1536 MB memory: 32 GB 3733 MHz LPDDR4X
/stats: encode: 5.4 tok/s, decode: 2.6 tok/s
GPU | OS | /stats |
---|---|---|
A100 (40G) | Debian GNU/Linux 10 | encode: 189.1 tok/s, decode: 18.9 tok/s |
My prompt is: "create a poem about los angeles". I use cuda as I think Vulkan is not available for A100. I thought A100 should run faster that RTX 30x0 series. Is it possibly due to the cuda driver? Thanks.
The latest update brought the decode speed for my Iphone 14 plus down to 0.5~1.0 tokens/s. Encode speed is about 22.
Yesterday before the update it is about 7.5 token generated per second...
2021 16-inch Apple M1 Pro (32GB) | OS: Ventura 13.3.1
encode: 45.8 tok/s, decode: 19.3 tok/s
Tested on:
8 GB RAM, 128 GB Storage iPadOS Version 16.5
Encode | Decode |
---|---|
34.4 tok/s | 14.1 tok/s |
APU 5800H,OS:win11 encode: 5.5 tok/s, decode: 8.5 tok/s
I think mine is running fully CPU based even though my GPU should be capable. Top was showing 900% and tokens were crawling out.
Log shows
Use lib /home/david/software/mlc-llm/dist/lib/vicuna-v1-7b_vulkan_float16.so
vulcaninfo shows
VkPhysicalDeviceProperties:
---------------------------
apiVersion = 4206816 (1.3.224)
driverVersion = 1 (0x0001)
vendorID = 0x10005
deviceID = 0x0000
deviceType = PHYSICAL_DEVICE_TYPE_CPU
deviceName = llvmpipe (LLVM 15.0.6, 256 bits)
pipelineCacheUUID = 76616c2d-2573-0000-0000-000000000000
GPU GeForce RTX 3070 w/ 8G CPU AMD Ryzen 5 5600
encode: 0.1 tok/s, decode: 0.1 tok/s
@davidbartonau can you try to upgrade your GPU driver? Looks like vulkaninfo failed to list your gpu
on Fedora using Zephyrus G14 2020, AMD Ryzen™ 5 4600HS 16GB ram, GeForce GTX 1650 Ti 4GB VRAM .
encode: 17.3 tok/s, decode: 15.6 tok/s
I'm extremely surprised with the outcome and performance !
iPad Pro 11 A12Z
encode: 5.1 tok/s, decode: 4.1 tok/s
I install the MLCChat app on iPad 11 Pro with iPadOS 16.4.1. But I met crash after a few seconds of [System] Initialize. @colakang How do you run it successfully?
On Intel Arc A770, using linux drivers installed from https://www.intel.com/content/www/us/en/download/747008/intel-arc-graphics-driver-ubuntu.html, I got
USER: Who is Shohei Ohtani?
encode: 118.6 tok/s, decode: 3.2 tok/s
USER: Write a poem for Pittsburgh.
encode: 77.0 tok/s, decode: 3.3 tok/s
USER: Explain why Japanese economy is declining.
encode: 38.0 tok/s, decode: 3.1 tok/s
It is nice to see that it works on Intel dGPU out of the box, but the current decoding perf is a joke haha
@davidbartonau Perhaps you haven't installed Vulkan SDK? And on NV you need a proprietary driver for vulkan. The log indicates you are using a CPU implementation of a VK driver from mesa.
13-inch, 2020, Two Thunderbolt 3 ports os:13.3 (22E252) processor: 1.4 GHz 四核Intel Core i5 graphics: Intel Iris Plus Graphics 645 1536 MB memory: 16 GB 2133 MHz LPDDR3
/stats: encode: 1.9 tok/s, decode: 2.9 tok/s
何时能支持安卓?8gen2支持int4精度,说不定有不错的速度
iPad Pro 11 A12Z encode: 5.1 tok/s, decode: 4.1 tok/s
I install the MLCChat app on iPad 11 Pro with iPadOS 16.4.1. But I met crash after a few seconds of [System] Initialize. @colakang How do you run it successfully?
OS 16.3.1
just normal install and launch it.
Tested on Device: Lenovo ThinkPad L13 Yoga CPU: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz 2.11 GHz RAM: 16,0 GB Grafics: Intel UHD Graphics (Comet Lake-U GT2) [V0] [Lenovo] 1GB OS: Windows 10 Education 22H2 model: \dist\lib\vicuna-v1-7b_vulkan_float16.dll encode: 1.4 tok/s, decode: 2.2 tok/s
Quite slow, but it works!
In the following example output there is a problem with the german letter "ü": How could that be resolved?
USER: Write a poem of the City Berlin ASSISTANT: Berlin, the city of contrasts Where the past and present collide A city of culture and history But also a city of modernity
The grand architecture of the past St. Hedwig's Cathedral and the Reichstag Mix with the bright lights and neon signs Of the city's modern face
But the city's history is not all It's also a city of innovation and technology The new headquarters of Google in the heart of the city A hub for entrepreneurs and startups
But it's not just the city of business The city is also known for its cultural attractions The Berlin Wall, Museum Island, and the Deutsche Museum f├╝r Technik
And the city's people are just as diverse From artists and musicians to politicians and activists All coming together in the city's many neighborhoods
So come and explore Berlin's past and present And discover the city's unique blend of history and modernity It's a city that will leave you wanting more But it's also a city that will stay in your heart forever.
what about https://www.aliyun.com/product/ecs/eais ?
@joostshao We are working on a generic deployment solution for common devices. We have no extra bandwidth to test for any specific target (e.g. EAIS). But happy to see contributions from the community and answer specific technical questions.
specs are RX6600XT (8.0gb), Ryzen 5 2600, and 47.9gb of RAM operating at 1866MHz over three calls of /stats and 5 requests made of vicuna, I got the following Encode and Decode: 1st: Encode: 58.8 Tok/S, Decode: 32.1 Tok/S 2nd: Encode: 34.8 Tok/S, Decode: 28.3 Tok/S 3rd: Encode: 81.7 Tok/S, Decode: 31.5 Tok/S
legitimately very impressed! although I would definitely love it if there was a convenient way to launch as the best I've managed is to create a .bat file that sits in the directory and launches anaconda prompt directly into that directory with a reminder of what command I need to type to launch Vicuna.
edit: apologies, I forgot to mention OS, which is windows 11
Performance report Operating system: Opensuse Tumbleweed CPU: AMD Ryzen 3900x RAM: 16GB GPU: AMD Radeon 6750xt
It seems the prompt has a large impact on the results, so I propose you suggest some "standard prompts" to try to standardise.
USER: Why is speaking about donating to charity considered immoral? I find it quite strange that social norms are such that discussing about charities one donates to are considered in bad taste. To me, charity seems purely a consequentialist concept. In other words, all that should matter is the total amount of charity dollars disbursed. To the person receiving the charity, the level of "bragging" of the charity donor is irrelevant. In other words, isn't the requirement that one has to be extremely modest and even silent about one's own contributions a very high bar to set for people who are already giving to charity? In other words, if we want to incentivize more charitable giving, we should encourage this process as much as possible, and imposing such a cost seems counter-productive. What are the arguments for the fact that one has to be of the purest heart with the most noble intentions to donate money to charities? ASSISTANT: There are a few reasons why people might find it awkward or inappropriate to discuss charitable donations:
However, it is important to note that charitable donations are not just about being a "good person" but it also helps in creating a more equitable society by providing support to those in need. It is also a way of making a positive impact on the community and addressing societal issues.
It is also important to remember that discussing charitable donations can be a good way to encourage more giving. It is also a way to educate and raise awareness about the importance of giving back to the community.
In summary, discussing charitable donations might be considered inappropriate by some people, but it is an important part of the conversation about creating a more equitable society by providing support to those in need. USER: /stats encode: 130.6 tok/s, decode: 9.0 tok/s
USER: Write a poem of the City Berlin ASSISTANT: Berlin, city of contrasts Where the past and present collide A place of both pain and delight Built on a history that's prime
A city of culture and history Where the old meets the new The relics of a dark and painful past Mingle with the bustling crowds in the town
But don't be fooled by the facade For the people of this city are strong and bold Fighting for their rights and their name To be heard and recognized far and game
So come and visit this great city And take in all the sights and the city has to offer From the historical sites to the nightlife It's a city that's always worth the tout. USER: /stats encode: 61.5 tok/s, decode: 9.3 tok/s
USER: These are the lyrics to a song I am writing an essay about. Could you help me by writing an essay about the meaning of the song, and what the mental state of the person singing it might be? "Desert you, Ooh-ooh-ooh-ooh, Hurt you, We're no strangers to love, You know the rules and so do I, A full commitment's what I'm thinking of, You wouldn't get this from any other guy, I just wanna tell you how I'm feeling, Gotta make you understand, Never gonna give you up, Never gonna let you down, Never gonna run around and desert you, Never gonna make you cry, Never gonna say goodbye, Never gonna tell a lie and hurt you, We've known each other for so long, Your heart's been aching, but you're too shy to say it, Inside, we both know what's been going on, We know the game, and we're gonna play it, And if you ask me how I'm feeling, Don't tell me you're too blind to see, Never gonna give you up, Never gonna let you down, Never gonna run around and desert you, Never gonna make you cry, Never gonna say goodbye, Never gonna tell a lie and hurt you, Never gonna give you up, Never gonna let you down, Never gonna run around and desert you, Never gonna make you cry, Never gonna say goodbye, Never gonna tell a lie and hurt you, Ooh (Give you up), Ooh-ooh (Give you up), Ooh-ooh, Never gonna give, never gonna give (Give you up), Ooh-ooh, Never gonna give, never gonna give (Give you up), We've known each other for so long, Your heart's been aching, but you're too shy to say it, Inside, we both know what's been going on, We know the game, and we're gonna play it, I just wanna tell you how I'm feeling, Gotta make you understand, Never gonna give you up, Never gonna let you down, Never gonna run around and desert you, Never gonna make you cry, Never gonna say goodbye, Never gonna tell a lie and hurt you, Never gonna give you up, Never gonna let you down, Never gonna run around and desert you, Never gonna make you cry, Never gonna say goodbye, Never gonna tell a lie and hurt you, Never gonna give you up, Never gonna let you down" ASSISTANT: The song "Desert You" is a melancholic ballad about a love that has endured over a long period of time. The singer expresses a desire to be with this person forever and promises not to give them up or let them down. They also express a fear that the other person may be thinking of leaving or "deserting" them. The lyrics suggest that the singer and the listener have known each other for a long time and have shared many experiences together. The listener is described as someone who is "aching" inside and is too shy to say it. The singer wants to tell them how they are feeling and make them understand the depth of USER: /stats encode: 154.3 tok/s, decode: 8.9 tok/s
I thought I posted this a few days back. I rebooted and the driver was working:
GPU GeForce RTX 3070 w/ 8G CPU AMD Ryzen 5 5600 encode: 94.6 tok/s, decode: 23.3 tok/s
Is it odd that me encode stats are slower than @kn100 yet my decode is a fair bit faster
Hello everyone!
Created an account just to leave my performance report here.
Processor AMD Ryzen 5 5600 6-Core Processor 3.50 GHz Installed RAM 16.0 GB System type 64-bit operating system, x64-based processor GPU Radeon RX 6700 XT 12GB VRAM GPU Driver Version AMD Adrenalin 23.4.3 Edition Windows 11 Home Version 22H2 Installed on 10/13/2022 OS build 22621.1555 Experience Windows Feature Experience Pack 1000.22640.1000.0
Coming from trying to use Oogabooga UI and GPT4All, this has been amazingly fast! The former lags since it doesn't have a Windows-native AMD GPU mode (forcing me to use CPU mode which takes 1-3 minutes on average to respond) and the latter, while having a faster CPU implementation, leads to some occasional system hitching while I'm using Chrome.
Since this one miraculously lets me use my AMD GPU right on Windows I can safely say that I've found my favorite way to interact with LLMs as of this writing. Kudos to all involved who made/make this project so awesome!
Now, I hope I'm not breaking some rule by posting the following but if it's not too much of a bother I'd appreciate some help with the following:
@cruzerngz Thanks for reporting the data points! I'm actually a bit surprised that Vicuna-7b actually runs with 680M on Yoga Slim 7 pro, given it has only 2GB of GPU memory. Perhaps the results comes from your CPU instead?
I use cuda as I think Vulkan is not available for A100. I thought A100 should run faster that RTX 30x0 series. Is it possibly due to the cuda driver? Thanks.
@flyman3046 Thanks for asking! We didn't optimize for server-class GPUs like A100 in this release, which means there's huge room to improve on them. We will get there pretty soon :-)
On Intel Arc A770, using linux drivers installed from https://www.intel.com/content/www/us/en/download/747008/intel-arc-graphics-driver-ubuntu.html, I got
USER: Who is Shohei Ohtani? encode: 118.6 tok/s, decode: 3.2 tok/s USER: Write a poem for Pittsburgh. encode: 77.0 tok/s, decode: 3.3 tok/s USER: Explain why Japanese economy is declining. encode: 38.0 tok/s, decode: 3.1 tok/s
It is nice to see that it works on Intel dGPU out of the box, but the current decoding perf is a joke haha
@masahi Glad to see it works at all on Intel dGPUs! The encoding performance looks really awesome, but I suspect there's some issues with decoding performance. Not sure what they are
specs are RX6600XT (8.0gb), Ryzen 5 2600, and 47.9gb of RAM operating at 1866MHz over three calls of /stats and 5 requests made of vicuna, I got the following Encode and Decode: 1st: Encode: 58.8 Tok/S, Decode: 32.1 Tok/S 2nd: Encode: 34.8 Tok/S, Decode: 28.3 Tok/S 3rd: Encode: 81.7 Tok/S, Decode: 31.5 Tok/S
legitimately very impressed! although I would definitely love it if there was a convenient way to launch as the best I've managed is to create a .bat file that sits in the directory and launches anaconda prompt directly into that directory with a reminder of what command I need to type to launch Vicuna.
Thanks for reporting the numbers! This is super exciting for me as well to see data points that MLC LLM works with a legit AMD GPU! In terms of user experience, I believe any app that wraps the shared/static library that MLC-LLM produces could make it look much better than a simple bat file :-) We plan to release a tutorial on how to make good use of those libraries soon
Hey @iamcodecurious, thanks for sharing the datapoints! Both of your questions are super valuable, and please allow me to walk you through :-)
- How exactly do I get this running on my browser? I'm relatively new to much of this so I'm stuck at the first step for WebLLM. It's unclear to me what I'm supposed to download and where/how I run that pip3 install thing.
MLC-LLM uses TVM Unity as the underlying compiler. As you know, the advantage of being a compiler is that it could be re-targeted to various backends, including but not limited to x86/ARM CPUs, AMD/NV/Apple GPUs or phones. To get it run in our browser, we are using TVM Unity to target a backend called WebGPU, and this is different than the one (Vulkan) you are using right now in your command line.
You do not have to install anything for WebGPU (except the latest Chrome) if you don't compile/quantize the model yourself. To get it working, you may use our WebLLM website, which is a webpage running locally on your GPUs without uploading your data or using any server resources - in fact, this is simply a Jekyll static page that embeds WebGPU code.
- Is there a way to use other LLMs besides the Vicuna V1 7B? If so please show me.
Yes, the code is there, but we are not officially announcing it yet until we validate very carefully. Will keep the community posted for latest updates!
UPDATE (08/09/2023):
We have done major performance overhaul in the past few months, and now I'm happy to share the latest results:
============================================================
Hi everyone,
We are looking to gather data points on running MLC-LLM on different hardwares and platforms. Our goal is to create a comprehensive reference for new users. Please share your own experiences in this thread! Thank you for your help!
NOTE: for benchmarking, we highly recommended a device of at least 6GB memory, because the model itself takes 2.9G already. For this reason, it is known that the iOS app will crash on a 4GB iPhone.
AMD GPUs
Macbook
Intel GPUs
NVIDIA GPUs
iOS
Android