Open davidjaw opened 11 months ago
We need to understand where regression is coming from, but sounds a bit like a torchvision problem, isn't it?
Also, I wonder if this is CUDA-11.8 vs CUDA-12.1 regression (2.0.1 was shipped with 11.8 by default, but 2.1 with 12.1)
Hello,
I've created a minimal toy example to demonstrate the issue in detail. You can find it here: https://gist.github.com/davidjaw/40bcbcf44cb3db01fd9178e193edb0de
This example relies on the ultralytics library. For context, the code runs as expected when using PyTorch version 2.0.1 and Torchvision version 0.15.2+cu118, and OOM when PyTorch 2.1.1. I believe this setup aligns with the requirements mentioned in the original issue.
Please take a look at the gist, and let me know if you need any more information or if there's anything else I can do to assist in resolving this issue.
Thank you!
I just want to chime in an mention that I have the same problem. A very large memory allocation is attempted both on the GPU and the CPU. I observe the problem in the following environment:
Collecting environment information...
PyTorch version: 2.1.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
But everything works with:
Collecting environment information...
PyTorch version: 2.1.0.dev20230714+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
Below is a little snippet which leads to the OOM error with 2.1.0+cu118
:
import torchvision
import torch
inp = torch.rand((1, 256, 48, 64))
bbox = torch.tensor([[0, 0, 0, 128, 96]]).float()
output_size = (48, 64)
scale = 12 / 384
aligned=True
torch.use_deterministic_algorithms(True)
out = torchvision.ops.roi_align(inp.cuda(), bbox.cuda(), output_size, scale, aligned=aligned)
Can someone else replicate this?
Oh you know what, it's probably because of use deterministic algorithms. We added a deterministic implementation but it is very memory hungry
Hey, I just came across this issue. Is there any update?
I understand the appeal of a deterministic implementation but the caveat very memory hungry is an understatement :D
When I call the problematic _bilinear_interpolate
-> masked_index
part manually it will allocate ~30GB VRAM for a single input of size 400x400. Essentially, this breaks any Mask R-CNN model when using torch.use_deterministic_algorithms(True)
.
Has this been actually tested or run in a benchmark? If so, how? I fail to see how this is the intended behavior unless I'm missing something fundamental 😅 thx for any help.
The implementation doesn't OOM if we torch.compile it. So I think I will fix it by making torch.compile on it mandatory.
The implementation doesn't OOM if we torch.compile it. So I think I will fix it by making torch.compile on it mandatory.
Jesus, how is that possible? Sounds great.
Thank you for the quick response and the fix @ezyang 🚀
I applied your patch manually in my system and can confirm that it does eliminate the OOM issue! A Mask R-CNN ResNet-50 FPN now consumes ~5500 MB for batch_size 2 on COCO and about ~38000 MB for batch_size 16.
I ran some quick tests using the torchvision reference implementation and can further confirm that we now have deterministic training (see below). In addition, I append some timings in case this helps moving forward.
torch 2.1.2+cu121
and torchvision 0.16.2+cu121
maskrcnn_resnet50_fpn
, coco
, batch_size 2
, hflip
(+ internal *ShortestEdgeResize*(800,1333)
)For reproducible determinism I set:
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.use_deterministic_algorithms(True)
os.environ['CUBLAS_WORKSPACE_CONFIG'] = ":4096:8"
# seed for main process/thread (torch, random, numpy)
seed=617971023
# seed for sampler and dataloader generators (num_workers=4)
seed_data=0
# nn.Module run 1
Epoch: [0] [ 0/58633] eta: 80 days, 11:30:44 lr: 0.000000 loss: 6.0988 (6.0988) loss_classifier: 4.5028 (4.5028) loss_box_reg: 0.0175 (0.0175) loss_mask: 0.7927 (0.7927) loss_objectness: 0.6928 (0.6928) loss_rpn_box_reg: 0.0930 (0.0930) time: 118.5927 data: 0.2771 max mem: 4346
Epoch: [0] [20/58633] eta: 12 days, 17:24:42 lr: 0.000002 loss: 6.0132 (6.0607) loss_classifier: 4.4406 (4.4362) loss_box_reg: 0.0319 (0.0412) loss_mask: 0.7437 (0.7650) loss_objectness: 0.6933 (0.6934) loss_rpn_box_reg: 0.0742 (0.1249) time: 13.7666 data: 0.0025 max mem: 5533
Epoch: [0] [40/58633] eta: 11 days, 12:51:42 lr: 0.000004 loss: 5.5885 (5.7992) loss_classifier: 3.9741 (4.1683) loss_box_reg: 0.0536 (0.0510) loss_mask: 0.7486 (0.7632) loss_objectness: 0.6906 (0.6920) loss_rpn_box_reg: 0.0687 (0.1248) time: 15.1755 data: 0.0028 max mem: 5533
Epoch: [0] [60/58633] eta: 11 days, 21:56:55 lr: 0.000006 loss: 3.4103 (4.9821) loss_classifier: 1.5982 (3.3080) loss_box_reg: 0.0917 (0.0753) loss_mask: 0.8951 (0.8050) loss_objectness: 0.6612 (0.6780) loss_rpn_box_reg: 0.0843 (0.1159) time: 18.7317 data: 0.0029 max mem: 5533
# nn.Module run 2
Epoch: [0] [ 0/58633] eta: 79 days, 04:24:00 lr: 0.000000 loss: 6.0988 (6.0988) loss_classifier: 4.5028 (4.5028) loss_box_reg: 0.0175 (0.0175) loss_mask: 0.7927 (0.7927) loss_objectness: 0.6928 (0.6928) loss_rpn_box_reg: 0.0930 (0.0930) time: 116.6824 data: 0.2564 max mem: 4346
Epoch: [0] [20/58633] eta: 12 days, 14:39:22 lr: 0.000002 loss: 6.0132 (6.0607) loss_classifier: 4.4406 (4.4362) loss_box_reg: 0.0319 (0.0412) loss_mask: 0.7437 (0.7650) loss_objectness: 0.6933 (0.6934) loss_rpn_box_reg: 0.0742 (0.1249) time: 13.6844 data: 0.0023 max mem: 5400
Epoch: [0] [40/58633] eta: 11 days, 11:22:22 lr: 0.000004 loss: 5.5885 (5.7992) loss_classifier: 3.9741 (4.1683) loss_box_reg: 0.0536 (0.0510) loss_mask: 0.7486 (0.7632) loss_objectness: 0.6906 (0.6920) loss_rpn_box_reg: 0.0687 (0.1248) time: 15.1657 data: 0.0026 max mem: 5400
Epoch: [0] [60/58633] eta: 11 days, 21:05:58 lr: 0.000006 loss: 3.4103 (4.9821) loss_classifier: 1.5982 (3.3080) loss_box_reg: 0.0917 (0.0753) loss_mask: 0.8951 (0.8050) loss_objectness: 0.6612 (0.6780) loss_rpn_box_reg: 0.0843 (0.1159) time: 18.7601 data: 0.0027 max mem: 5400
# DDP (world_size 1) run 1
Epoch: [0] [ 0/58633] eta: 70 days, 18:22:51 lr: 0.000000 loss: 6.0988 (6.0988) loss_classifier: 4.5028 (4.5028) loss_box_reg: 0.0175 (0.0175) loss_mask: 0.7927 (0.7927) loss_objectness: 0.6928 (0.6928) loss_rpn_box_reg: 0.0930 (0.0930) time: 104.2787 data: 0.4824 max mem: 4515
Epoch: [0] [20/58633] eta: 12 days, 02:47:31 lr: 0.000002 loss: 6.0132 (6.0607) loss_classifier: 4.4406 (4.4362) loss_box_reg: 0.0319 (0.0412) loss_mask: 0.7437 (0.7650) loss_objectness: 0.6933 (0.6934) loss_rpn_box_reg: 0.0742 (0.1249) time: 13.5395 data: 0.0021 max mem: 5572
Epoch: [0] [40/58633] eta: 11 days, 04:49:32 lr: 0.000004 loss: 5.5885 (5.7992) loss_classifier: 3.9741 (4.1683) loss_box_reg: 0.0536 (0.0510) loss_mask: 0.7486 (0.7632) loss_objectness: 0.6906 (0.6920) loss_rpn_box_reg: 0.0687 (0.1248) time: 15.1061 data: 0.0027 max mem: 5572
Epoch: [0] [60/58633] eta: 11 days, 16:31:04 lr: 0.000006 loss: 3.4103 (4.9821) loss_classifier: 1.5982 (3.3080) loss_box_reg: 0.0917 (0.0753) loss_mask: 0.8951 (0.8050) loss_objectness: 0.6612 (0.6780) loss_rpn_box_reg: 0.0843 (0.1159) time: 18.7259 data: 0.0028 max mem: 5572
# DDP (world_size 1) run 2
Epoch: [0] [ 0/58633] eta: 71 days, 13:13:18 lr: 0.000000 loss: 6.0988 (6.0988) loss_classifier: 4.5028 (4.5028) loss_box_reg: 0.0175 (0.0175) loss_mask: 0.7927 (0.7927) loss_objectness: 0.6928 (0.6928) loss_rpn_box_reg: 0.0930 (0.0930) time: 105.4355 data: 0.4228 max mem: 4515
Epoch: [0] [20/58633] eta: 12 days, 03:27:48 lr: 0.000002 loss: 6.0132 (6.0607) loss_classifier: 4.4406 (4.4362) loss_box_reg: 0.0319 (0.0412) loss_mask: 0.7437 (0.7650) loss_objectness: 0.6933 (0.6934) loss_rpn_box_reg: 0.0742 (0.1249) time: 13.5249 data: 0.0024 max mem: 5566
Epoch: [0] [40/58633] eta: 11 days, 05:10:02 lr: 0.000004 loss: 5.5885 (5.7992) loss_classifier: 3.9741 (4.1683) loss_box_reg: 0.0536 (0.0510) loss_mask: 0.7486 (0.7632) loss_objectness: 0.6906 (0.6920) loss_rpn_box_reg: 0.0687 (0.1248) time: 15.1059 data: 0.0027 max mem: 5566
Epoch: [0] [60/58633] eta: 11 days, 16:41:03 lr: 0.000006 loss: 3.4103 (4.9821) loss_classifier: 1.5982 (3.3080) loss_box_reg: 0.0917 (0.0753) loss_mask: 0.8951 (0.8050) loss_objectness: 0.6612 (0.6780) loss_rpn_box_reg: 0.0843 (0.1159) time: 18.7140 data: 0.0026 max mem: 5566
To anyone who randomly stumbles across this, please NOTE:
The following measurements are NOT definitive. It's only a quick check of a not yet merged fix that I have applied manually! So unless you read the context of the whole conversation, please don't draw any strong conclusion from this or worse, use it to claim anything about PyTorch/Torchvision speed/mem, thx.
The following values are for DDP models with batch_size 2 per gpu (world_size) and depict the avg. time per batch and max_mem after 20 batches (measured with MetricLogger from reference implementation).
world_size | deterministic | avg. time/batch | max_mem |
---|---|---|---|
1 | False | 0.1548 | 2929 |
1 | True | 13.0091 (~84x) | 5377 (~1.83x) |
4 | False | 0.1909 | 3094 |
4 | True | 24.5909 (~139x) | 5645 (~1.82x) |
8 | False | 0.2002 | 3094 |
8 | True | 29.8613 (~149x) | 5547 (~1.79x) |
I get the following deprecation warnings with torch 2.1.2+cu121
, torchvision 0.16.2+cu121
UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
By the way, I think Inductor can potentially do a lot better codegen on this to bring down the time/memory overhead, just need some concerted elbow grease on it.
I'd love to help but have to admit that this part of the code base is a little over my head 😅 However, if I can support you with some isolated testing (that does not require in-depth knowledge) let me know!
Quick update: I ran some more tests to see if newer torch/torchvision versions will improve things but, it appears that I've been lucky with torch 2.1.2+cu121
and torchvision 0.16.2+cu121
and this needs more testing 😅 .
With torch 2.2.2+cu121
and torchvision 0.17.2+cu121
it runs OOM again and with torch 2.3.0+cu121
and torchvision 0.18.0+cu121
I get an assert error, see below.
Both are the pre-build versions from pypi. DDM models on 1 GPU (but same errors with nn.Module). I replaced my env path with ... for brevity. Hope this helps to narrow things down.
torch 2.2.2+cu121
and torchvision 0.17.2+cu121
skipping cudagraphs due to deterministic index put. Found from :
File ".../site-packages/torchvision/ops/roi_align.py", line 185, in _roi_align
val = _bilinear_interpolate(input, roi_batch_ind, y, x, ymask, xmask) # [K, C, PH, PW, IY, IX]
File ".../site-packages/torchvision/ops/roi_align.py", line 78, in _bilinear_interpolate
v1 = masked_index(y_low, x_low)
File ".../site-packages/torchvision/ops/roi_align.py", line 71, in masked_index
return input[
...
3 x more times the same skipping cudagraphs message, not shown here.
...
[rank0]:[2024-05-23 13:24:02,534] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8)
[rank0]:[2024-05-23 13:24:02,534] torch._dynamo.convert_frame: [WARNING] function: '_roi_align' (.../site-packages/torchvision/ops/roi_align.py:114)
[rank0]:[2024-05-23 13:24:02,534] torch._dynamo.convert_frame: [WARNING] last reason: ___check_global_state()
[rank0]:[2024-05-23 13:24:02,534] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[rank0]:[2024-05-23 13:24:02,534] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html.
...
File ".../site-packages/torchvision/ops/roi_align.py", line 185, in _roi_align
val = _bilinear_interpolate(input, roi_batch_ind, y, x, ymask, xmask) # [K, C, PH, PW, IY, IX]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../site-packages/torchvision/ops/roi_align.py", line 32, in _bilinear_interpolate
def _bilinear_interpolate(
File ".../site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File ".../site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File ".../site-packages/torch/_functorch/aot_autograd.py", line 901, in forward
return compiled_fn(full_args)
^^^^^^^^^^^^^^^^^^^^^^
File ".../site-packages/torch/_functorch/_aot_autograd/utils.py", line 81, in g
return f(*args)
^^^^^^^^
File ".../site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 94, in runtime_wrapper
all_outs = call_func_at_runtime_with_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../site-packages/torch/_functorch/_aot_autograd/utils.py", line 105, in call_func_at_runtime_with_args
out = normalize_as_list(f(args))
^^^^^^^
File ".../site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 118, in rng_functionalization_wrapper
return compiled_fw(args)
^^^^^^^^^^^^^^^^^
File ".../site-packages/torch/_inductor/codecache.py", line 864, in __call__
return self.get_current_callable()(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../site-packages/torch/_inductor/compile_fx.py", line 611, in run
return model(new_inputs)
^^^^^^^^^^^^^^^^^
File ".../site-packages/torch/_inductor/codecache.py", line 892, in _run_from_cache
return compiled_graph.compiled_artifact(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/torchinductor_theodoridis/gf/cgf64g3izdp33vfvomwklyg4wg6lvlmuxkvmlnfriqaqqb2j6wtc.py", line 175, in call
buf0 = empty_strided((s12, 1, s4, s7, s10, s11), (s10*s11*s4*s7, s10*s11*s12*s4*s7, s10*s11*s7, s10*s11, s11, 1), device='cuda', dtype=torch.float32)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 78.57 GiB. GPU 0 has a total capacity of 47.53 GiB of which 44.87 GiB is free. Including non-PyTorch memory, this process has 2.65 GiB memory in use. Of the allocated memory 1.23 GiB is allocated by PyTorch, and 580.03 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
torch 2.3.0+cu121
and torchvision 0.18.0+cu121
[rank0]: Traceback (most recent call last):
[rank0]: File "./train.py", line 384, in <module>
[rank0]: main(cfg=cfg, distributed=distributed, gpu_id=gpu_id)
[rank0]: File ".../site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
[rank0]: return f(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^
[rank0]: File "./train.py", line 286, in main
[rank0]: train_one_epoch(model=model,
[rank0]: File "./src/engine/engine.py", line 96, in train_one_epoch
[rank0]: loss.backward()
[rank0]: File ".../site-packages/torch/_tensor.py", line 525, in backward
[rank0]: torch.autograd.backward(
[rank0]: File ".../site-packages/torch/autograd/__init__.py", line 267, in backward
[rank0]: _engine_run_backward(
[rank0]: File ".../site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
[rank0]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File ".../site-packages/torch/autograd/function.py", line 301, in apply
[rank0]: return user_fn(self, *args)
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File ".../site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 882, in backward
[rank0]: out = call_compiled_backward()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File ".../site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 831, in call_compiled_backward
[rank0]: out = call_func_at_runtime_with_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File ".../site-packages/torch/_functorch/_aot_autograd/utils.py", line 113, in call_func_at_runtime_with_args
[rank0]: out = normalize_as_list(f(args))
[rank0]: ^^^^^^^
[rank0]: File ".../site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File ".../site-packages/torch/_dynamo/external_utils.py", line 36, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File ".../site-packages/torch/_inductor/codecache.py", line 906, in __call__
[rank0]: return self.get_current_callable()(inputs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File ".../site-packages/torch/_inductor/compile_fx.py", line 784, in run
[rank0]: return model(new_inputs)
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File ".../site-packages/torch/_inductor/codecache.py", line 934, in _run_from_cache
[rank0]: return compiled_graph.compiled_artifact(inputs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/tmp/torchinductor_theodoridis/xl/cxlqkkc7ap73hke3wfa5g76sk6nbmzebnv3d5v5jyb64iie2vne5.py", line 182, in call
[rank0]: assert_size_stride(unsqueeze_86, (s4, 1, s6, 1, s8, 1), (14, 0, 2, 0, 1, 0))
[rank0]: AssertionError: expected size 12==12, stride 28==14 at dim=0
Maybe related to mask_roi_pool output_size=14? Or this warning I get with torch.use_deterministic_algorithms(False)
, as discussed here?
.../site-packages/torch/autograd/graph.py:744: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
.../site-packages/torch/nn/modules/conv.py:456: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv2d(input, weight, bias, self.stride,
Just for clarity on your setup, did you manually patch in the change to the prebuilt binaries of torchvision to test them?
Yes, I have three separate conda envs with the mentioned torch and torchvision versions (installed from pypi with pip), and manually patched the torchvision/ops/roi_align.py files in their site-packages to match the file of #8436. This naive approach resulted in the success and errors mentioned above. Let me know if you need more info or if this was too naive and is better tested in a different way. As mentioned, I'm not very familiar with torch.compile/dynamo/inductor.
Reopening for torch version incompatibility
Bah, I don't have a ready to go maskrcnn setup that I can use to easily test this
@JohannesTheo do you have a suggested way of reproducing your problems? Alternately, if you are able to do runs with TORCH_TRACE and upload them here, that would also be greatly helpful.
Hey @ezyang, I will put something together on the WE.
🐛 Describe the bug
Description
I am encountering an Out of Memory (OOM) error when using the roi_align function from PyTorch version 2.1.1 with torchvision 0.16.1. This issue does not occur with PyTorch version 2.0.1 and torchvision 0.15.2. The error happens regardless of the GPU used (tested on NVIDIA A2000 and RTX 4090). Note that when I downgrade the PyTorch and torchvision back to 2.0.1 and 0.15.2, the function can work properly. I am seeking assistance in understanding why this OOM error occurs in the newer versions of PyTorch and torchvision and whether this is a bug or a change in how roi_align manages memory.
Background
Function
The
object_roi_align
function crops feature maps based on YOLO's object detection labels and uses RoI align to extract features of the object. The function accepts feature maps, YOLO detection labels, and several optional parameters for noise and class constraints.Error messages (A2000)
Error message (RTX 4090)
Versions
Versions (RTX 4090)
Versions (A2000)
cc @ezyang @gchanan @zou3519 @kadeng @ptrblck