Closed chufengt closed 2 years ago
Hi @chufengt,
Thanks for your interest in LSeg!
Hope this helps.
Hi, @Boyiliee,
Thanks for your reply. It really helps. I have some extra questions:
train.sh
.lseg_module.py
), but using 480 is ok. The backbone is vit_l16 and I use 32G V100 * 8. Is it reasonable?train.sh
. It seems much lower than the SoTA results on semantic segmentation. Can you give me some suggestions about how to improve the results?Thanks again.
Another quick question.
In test_lseg.py
:
scales = (
[0.75, 1.0, 1.25, 1.5, 1.75, 2.0, 2.25]
if "citys" in args.dataset
else [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
)
Could you give some references for these selected scales? I'm not very familiar with semantic segmentation but I found different scales used in HRNet: https://github.com/HRNet/HRNet-Semantic-Segmentation
Performance on the Cityscapes dataset. The models are trained and tested with the input size of 512x1024 and 1024x2048 respectively. If multi-scale testing is used, we adopt scales: 0.5,0.75,1.0,1.25,1.5,1.75.
Performance on the ADE20K dataset. The models are trained and tested with the input size of 520x520. If multi-scale testing is used, we adopt scales: 0.5,0.75,1.0,1.25,1.5,1.75,2.0 (the same as EncNet, DANet etc.).
We don't conduct experiments on cityscapes. For semantic segmentation, we strictly follow the setting of DPT: https://github.com/isl-org/DPT. Please find the github for more details. Hope this helps!
Hi, @Boyiliee,
Thanks for your reply.
It seems that DPT did not release the training code as well as the detailed settings for semantic segmentation.
Thanks again.
Hi, @Boyiliee,
Thanks for your reply. It really helps. I have some extra questions:
- For the training time mentioned above, I noticed that you said '1-2days for ade20k' in Training configuration #7, does it measure with vit_b32 or vit_l16? I'm not sure whether the ~90h training time for vit_l16 on ade20k is reasonable or not. The config is the same as
train.sh
.- Does this code support multi-node (e.g., 8*2 GPUs) training?
- When I tried to train LSeg on Cityscapes, I got the 'out of cuda memory' error with the crop size of 768 (line 31 in
lseg_module.py
), but using 480 is ok. The backbone is vit_l16 and I use 32G V100 * 8. Is it reasonable?- For Cityscapes, I got mIoU≈60% with the vit_l16 backbone. Other configs are the same as
train.sh
. It seems much lower than the SoTA results on semantic segmentation. Can you give me some suggestions about how to improve the results?Thanks again.
Hi!
Thanks for your work, and it's really impressive. But I would suggest you put the 4th point about adding label files in the README and also raise an error or warning when args.dataset is not ade20k, since the dataset choice is hardcoded in the LSegModule class. This may save like a few hours for anyone who hopes to use your codebase on other datasets.
Thanks again!
Hi,
Thanks for open-sourcing such great work. I have some questions when using this code:
test_lseg.py
script support multi-GPU inference? When using a single GPU, it takes about 2~3 hours for inference on ade20k.demo_e200.ckpt
on ade20k and got (pixAcc: 0.8078, mIoU: 0.3207), is that correct? It seems lower than the values in the paper.train.sh
, backbone is vit_l16_384) with 8*V100 but found it needs ~90 hours for training 240 epochs. Is it reasonable (it seems much longer than you said in #7)?get_labels()
inlseg_module.py
. Have you evaluated the mIoU on cityscapes?Thanks in advance.