Closed yihu-dev closed 4 months ago
Hey @yihu-dev, how much time did it take for you to run the inference on 5k tiles with your A100 ? Thanks
Hey @yihu-dev, how much time did it take for you to run the inference on 5k tiles with your A100 ? Thanks
Hi, it takes less than 4 mins to do tile encoding and around 0.1s to extract slide level feature.
Hi,
Great work!
When I was extracting the slide level embed, it works well on small coords, but the CUDA error was trigged if the coords value is large, say, 28000.
40 out = rearrange(out, 'b l (r h) d -> b l h d r', r=ratio) ---> 41 out = torch.diag_embed(out, offset=0, dim1=4, dim2=5) 42 out = rearrange(out, 'b l h d r1 r2 -> b (r2 h) (l r1) d', r1=ratio, r2=ratio) 44 lse = rearrange(lse, 'b (r h) l -> b l h r', r=ratio) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.The features are extracted as 10x magnification for quick test, the feature shape is [1, 5308, 1536]) and the coords shape is [1, 5308, 2], and both in cuda float16 type. The coords are x,y patch locations as mentioned in issue#2
For coordinates, it's basically X-Y, for example, (256, 256), (256, 512), (256, 768), ...
I use single A100 GPU with 40g memory. I'm wondering if the large coordinates cause the CUDA error.
Thanks
Same error in when using cuda and torch.float16 (looks to be required to support FlashAttention) in line:
File "/workspace/code/gigapath/slide_encoder.py", line 204, in forward
x = torch.cat((cls_tokens, x), dim=1)
But to me it looks to be independent of the number of patches as it passes for [1,707,1536] but not for [1,491,1536]
Hi,
Great work!
When I was extracting the slide level embed, it works well on small coords, but the CUDA error was trigged if the coords value is large, say, 28000.
40 out = rearrange(out, 'b l (r h) d -> b l h d r', r=ratio) ---> 41 out = torch.diag_embed(out, offset=0, dim1=4, dim2=5) 42 out = rearrange(out, 'b l h d r1 r2 -> b (r2 h) (l r1) d', r1=ratio, r2=ratio) 44 lse = rearrange(lse, 'b (r h) l -> b l h r', r=ratio) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.The features are extracted as 10x magnification for quick test, the feature shape is [1, 5308, 1536]) and the coords shape is [1, 5308, 2], and both in cuda float16 type. The coords are x,y patch locations as mentioned in issue#2
For coordinates, it's basically X-Y, for example, (256, 256), (256, 512), (256, 768), ...
I use single A100 GPU with 40g memory. I'm wondering if the large coordinates cause the CUDA error.
Thanks
Hi, thank you for your interests in our work!
I just tried the following and it works well for me.
import torch
import gigapath.slide_encoder as slide_encoder
# load from HuggingFace
model = slide_encoder.create_model("hf_hub:prov-gigapath/prov-gigapath", "gigapath_slide_enc12l768d", 1536)
print("param #", sum(p.numel() for p in model.parameters()))
tile_embed = torch.randn(1, 5308, 1536).cuda()
coords = 28000 * torch.ones(1, 5308, 2).cuda()
model = model.cuda()
with torch.no_grad():
with torch.cuda.amp.autocast(dtype=torch.float16):
output = model(tile_embed, coords)
print(output)
Did you incorporate with torch.cuda.amp.autocast(dtype=torch.float16):
when doing inference? Would be nice if you could share your inputs and codes. Looking forward to hearing from you!
@HanwenXuTHU, thanks for the feedback. with torch.cuda.amp.autocast(dtype=torch.float16):
worked for me. In another order of things, what motivated having different length in the latent space between patch-level (1536) and slide-level (768) features?
@HanwenXuTHU, thanks for the feedback. with
torch.cuda.amp.autocast(dtype=torch.float16):
worked for me. In another order of things, what motivated having different length in the latent space between patch-level (1536) and slide-level (768) features?
Good question! The 1536 dimension is a standard ViT-Giant setting while 768 is a ViT-Base setting. We chose ViT-G -> ViT-B as a first design choice.
@HanwenXuTHU, it worked, thank you! I have not used that line but just cast both the model and data to float16.
Hi,
Great work!
When I was extracting the slide level embed, it works well on small coords, but the CUDA error was trigged if the coords value is large, say, 28000.
The features are extracted as 10x magnification for quick test, the feature shape is [1, 5308, 1536]) and the coords shape is [1, 5308, 2], and both in cuda float16 type. The coords are x,y patch locations as mentioned in issue#2
I use single A100 GPU with 40g memory. I'm wondering if the large coordinates cause the CUDA error.
Thanks