Open vladmandic opened 4 years ago
we can keep it simple - simpler model with multiple variations,
only initialized with checkpoints with higher and higher resolution
model are efficientdet family, converted from thub saved_model
to graph_model
using:
tensorflowjs_converter --signature_name=serving_default --input_format=tf_hub https://tfhub.dev/tensorflow/efficientdet/d0/1 efficientdet-d0
i wanted to get more data by running tf.profile()
, but that only works for smallest d0 model and fails with out-of-memory for all others, can't even get d1 variation to run.
instead, here is the snapshot from task manager showing gpu memory usage during each run
and WEBGL_DELETE_TEXTURE_THRESHOLD=0
for most aggressive memory deallocation:
EfficientDet-D0, model size 19,378,390 bytes: success
EfficientDet-D1, model size 25,084,265 bytes: success
EfficientDet-D2, model size 30,825,917 bytes: success
EfficientDet-D3, model size 48,050,408 bytes: success
EfficientDet-D4, model size 83,665,253 bytes: error with webgl out-of-memory, happens at the end of inference run
EfficientDet-D5, model size 122,135,609 bytes error with webgl out-of-memory, happens almost immediately
all models (up to d7) run correctly and with no huge memory usage in nodejs
@vladmandic Thank you for the detail report. @annxingyuan can you help to take a look what causes the GPU OOM during the run? thanks.
@pyu10055 @annxingyuan any updates?
Hi @vladmandic - apologies for the delay. I've uploaded a test build of the WebGL backend here: https://storage.googleapis.com/learnjs-data/temp/tf-backend-webgl.es2017.memfix.js
Would you mind testing this out to see whether it fixes the getPackedSampler2D
error you pasted above, when WEBGL_DELETE_TEXTURE_THRESHOLD
is set to 0?
@annxingyuan
i cannot get your tf-backend-webgl.es2017.memfix.js
to work with generic tfjs-core
from tfjs 2.7.0, getting error:
The kernel 'undefined' for backend 'webgl' is already registered
and later it fails on first tensor operation as not implemented.
can you point me to your branch and i'll just do a full rebuild myself? it's easier than trying to mix&match.
Sure - here you go: https://github.com/tensorflow/tfjs/pull/4240
build works, but unfortunately it doesn't improve webgl memory consumption - it's actually slightly higher in the first phase and the same in the second phase (where there is a max peak).
in both cases, deallocation triggered by WEBGL_DELETE_TEXTURE_THRESHOLD=0
works fine and webgl memory consuption returns to baseline. the excessive usage is only during the inference itself.
if anything, #4240 branch improves final deallocation a bit, but it's hard to tell.
tested with efficientdet-d2
and input picture with size 800px.
x-axis is ticks in seconds, y-axis is from 0 to 4GB (high baseline on my system is due to dual 4k monitors so idle system consumes ~1.2GB of GPU memory).
using tfjs 2.7.0
using tfjs from #4240 branch:
few more tests - seems that this proposed fix does solve one problem - if a model executes in 4GB, deallocation will now work and subsequent executions will continue to work.
but it doesn't solve core issue - why is there such an enormous gpu memory usage with webgl to start with compared to any other backend? i cannot get anything even remotely complex to execute within 4gb gpu, so deallocation at the end doesn't help.
also, not sure if its feasible given it's lossy compression and not sure how tfjs works with such textures, but perhaps it's worth taking a look at
@pyu10055 @annxingyuan @rthadur sorry to bug, but is there any update on this issue? no progress for over two months and it's pretty much a blocker for one of my projects as complex models simply cannot be used at all.
just to confirm, issue is pretty much the same in tfjs
3.1.0 as it was with tfjs
2.7.0 when i reported it,
just the message changed from
Error: Failed to compile fragment shader. (webgl_util.js:82)
to
Error: Failed to link vertex and fragment shaders. (webgl_util.js:117)
@rthadur @pyu10055
this issue (and similar one under #4129) is open since October and sitting idle without assignment?
@rthadur @jinjingforever @annxingyuan
one more ping regarding issue open since october 2020 without any progress?
Sorry.. I currently don't have bandwidth to tackle this issue... Not sure if anybody else has time to take this? @rthadur @mattsoulanille @pyu10055
Hi, @vladmandic
Apologize for the delayed response and we're re-visiting our older issues and checking whether those issues got resolved or not as of now so May I know are you still looking for the solution or your issue got resolved ?
If issue still persists after trying with latest version of TFJs please let us know with error log and code snippet to replicate the same issue from our end ?
Could you please confirm if this issue is resolved for you ? Please feel free to close the issue if it is resolved ? Thank you!
any improvements in this area would be very welcome, but i guess its one of those "it-is-what-it-is"...
for more complex object detection models such as
faster_rcnn_inception_resnet_v2_atrous_oidv4
from TF Model Zoo (model stats are: numBytes 255,915,844 numTensors 17,653)
i cannot get valid run no matter what on a my notebook with a 4GB GPU using
webgl
backendnote that everything below works out-of-the-box in nodejs using
tfjs-node
ortfjs-node-gpu
,the issue is
tfjs
with backendwebgl
specificand with
tfjs-node
, memory usage stays at 550MB during execution,so nowhere close to 4+GB when using WebGL
why is webgl backend using so much more memory?
anyhow...
if i don't set
WEBGL_DELETE_TEXTURE_THRESHOLD
, i run into webgl out-of-memory(confirmed by looking at GPU memory usage in Windows 10 task manager)
and loss of context resulting in standard error:
that's 4GB exhausted in a single inference while non-webgl backends have no issues with less than 1GB
but if i do set WEBGL_DELETE_TEXTURE_THRESHOLD to 0 or any number below max available GPU memory,
i run into what looks like access violation due to access of deallocated shader/texture:
it looks that
WEBGL_DELETE_TEXTURE_THRESHOLD
is all-or-nothing when it comes to releasing objects,there is no concept of tracking what is referenced or not
and if i try to reduce memory usage by using f16 by setting
WEBGL_FORCE_F16_TEXTURES
,i get shape mismatch due to clipped values
4GB GPU should be more than enough to handle 255MB model that executes in 550MB anywhere else but WebGL
(no matter how different WebGL is, >8x memory usage is not acceptable)
And to confirm results in
tfjs-node
,tf.profile()
returns peak of 550MBenvironment: TFJS 2.7.0 on windows 10 build 19042 with chrome 86