Open td-anne opened 1 year ago
In fact I think I may know what has happened. First, I have set the input image rescaling to at most 800 for the longest side (1333 overflows my GPU RAM when images need to be padded out to 1333x1333). Second, my image augmentation (using albumentations.BBoxSafeRandomCrop
) may, rarely, produce one-pixel-wide images. If these are rescaled to produce 800x1 images, then there aren't more than 800 values in lvl_mask
. Does this sound plausible?
Yes, if you have fewer classes, it makes sense to have fewer predictions. It should be fine to change the class-agnostic topk. We tried a couple values and did not find too much of a difference.
Your 800x1 images could also be a problem. Though there could be more proposals since we have multi-level features.
You can also try out checkpointing to avoid GPU OOM.
The 800x1 images are, obviously, not of any use, so I don't care what values get returned as long as it doesn't crash. The checkpointing is interesting, though: could the model cope with 1920 by 1080 images? Or does that require changing the structure somewhat? My raw inputs are all 1920 by 1080 and I'm looking for broken wires, which might disappear when downscaled. For the moment I'm more interested in accuracy than speed.
I see that makes sense for high resolution. We typically use larger images during pre-training so I don't think 1920x1080 should be a problem.
I am running DETA on a data set with only one real class (and one N/A class; in particular various tensors are n by 2). In some long runs, the run fails with
RuntimeError: selected index k out of range
at the line below:https://github.com/jozhang97/DETA/blob/985fa0b7afbbd86db6f907ff3a855828947ff631/models/deformable_transformer.py#L188
If I understand correctly, this should only be failing if the number
k
requested fromtopk
, in this casepre_nms_topk
, which is 1000, is too small; specifically I believe this can only happen if the length of thelvl_mask
is less than 1000. (Perhaps my data augmentation has produced an unreasonably tiny image? I thought they were all rescaled.) I don't really understand where we are in the code when this occurs, but would it be harmful to trim thek
supplied totopk
down to the available length?