zwbx / SHIFT-Continuous_Test_Time_Adaptation

21 stars 0 forks source link

Results reproduction #4

Closed dmn-sjk closed 1 year ago

dmn-sjk commented 1 year ago

Hi, I'm having issues reproducing the results presented in the README. This is what I'm getting in the console output at the end of the experiments::

+--------+-------+-------+-------+ | Scope | mIoU | mAcc | aAcc | +--------+-------+-------+-------+ | global | 34.15 | 47.33 | 84.45 | +--------+-------+-------+-------+


- TENT:

+---------------+-------+-------+ | Class | IoU | Acc | +---------------+-------+-------+ | building | 81.89 | 98.58 | | fence | 0.0 | nan | | pedestrian | 29.38 | 32.81 | | pole | 15.35 | 16.33 | | road line | 67.69 | 70.1 | | road | 94.38 | 98.88 | | sidewalk | 65.69 | 70.01 | | vegetation | 19.99 | 21.22 | | vehicle | 80.53 | 86.84 | | wall | 0.0 | nan | | traffic sign | 12.71 | 12.87 | | sky | 0.0 | 0.0 | | traffic light | 0.0 | 0.0 | | terrain | 2.32 | 2.4 | +---------------+-------+-------+ Summary:

+--------+-------+------+-------+ | Scope | mIoU | mAcc | aAcc | +--------+-------+------+-------+ | global | 33.57 | 42.5 | 88.59 | +--------+-------+------+-------+


- CoTTA:

+---------------+-------+-------+ | Class | IoU | Acc | +---------------+-------+-------+ | building | 82.9 | 93.07 | | fence | 0.0 | nan | | pedestrian | 34.17 | 39.79 | | pole | 27.62 | 31.93 | | road line | 70.57 | 73.32 | | road | 82.77 | 85.04 | | sidewalk | 49.3 | 51.98 | | vegetation | 28.24 | 69.21 | | vehicle | 77.5 | 92.33 | | wall | 0.0 | nan | | traffic sign | 30.73 | 32.2 | | sky | 0.0 | 0.0 | | traffic light | 0.0 | 0.0 | | terrain | 1.47 | 4.1 | +---------------+-------+-------+ Summary:

+--------+-------+-------+-------+ | Scope | mIoU | mAcc | aAcc | +--------+-------+-------+-------+ | global | 34.66 | 47.75 | 83.82 | +--------+-------+-------+-------+


I have observed that the results are also appended to the file 'Test_on_videos_1x_val/<source_model | tent | cotta>/evaluation.txt.' In this file, the newly appended results are the same as the existing ones, except for insignificant deviations in 'cotta' probably due to its stochastic component. **Should I assume that everything is correct and ignore the results from the console output?**

I run experiments with the following command:
`export CUDA_VISIBLE_DEVICES=0 ; python  tools/<test | tent | cotta>.py  local_configs/shift_val_800x500.py work_dirs/deeplabv3_r50_shift_800x500/iter_40000.pth`

The environment was set up according to the README. Packages:

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
attr 0.3.1 pypi_0 pypi backcall 0.2.0 pypi_0 pypi ca-certificates 2023.05.30 h06a4308_0
certifi 2023.5.7 pypi_0 pypi charset-normalizer 3.1.0 pypi_0 pypi click 8.1.3 pypi_0 pypi colorama 0.4.6 pypi_0 pypi decorator 5.1.0 pypi_0 pypi idna 3.4 pypi_0 pypi ipython 7.28.0 pypi_0 pypi jedi 0.18.0 pypi_0 pypi kornia 0.5.11 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
markdown-it-py 2.2.0 pypi_0 pypi matplotlib-inline 0.1.3 pypi_0 pypi mmcv-full 1.2.7 pypi_0 pypi mmsegmentation 0.11.0 dev_0 ncurses 6.4 h6a678d5_0
opencv-python 4.5.1.48 pypi_0 pypi openssl 3.0.9 h7f8727e_0
packaging 21.0 pypi_0 pypi parso 0.8.2 pypi_0 pypi pexpect 4.8.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pip 23.1.2 py38h06a4308_0
prompt-toolkit 3.0.20 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pygments 2.10.0 pypi_0 pypi python 3.8.16 h955ad1f_4
pyyaml 5.4.1 pypi_0 pypi readline 8.2 h5eee18b_0
requests 2.31.0 pypi_0 pypi rich 13.2.0 pypi_0 pypi setuptools 67.8.0 py38h06a4308_0
six 1.16.0 pypi_0 pypi sqlite 3.41.2 h5eee18b_0
terminaltables 3.1.10 pypi_0 pypi timm 0.3.2 pypi_0 pypi tk 8.6.12 h1ccaba5_0
torch 1.7.1+cu110 pypi_0 pypi torchvision 0.8.2+cu110 pypi_0 pypi traitlets 5.1.0 pypi_0 pypi wcwidth 0.2.5 pypi_0 pypi wheel 0.38.4 py38h06a4308_0
xz 5.4.2 h5eee18b_0
yapf 0.31.0 pypi_0 pypi zlib 1.2.13 h5eee18b_0

zwbx commented 1 year ago

Hi, thanks for participating in this competetion. Regarding this problem that 'Test_on_videos_1x_val/<source_model | tent | cotta>/evaluation.txt.' are the same as the ones output in terminal, it is possiblely because you do not set the seq correct in mmseg/datasets/shift.py so that only 1 sequence is tested. can you check it.


发件人: Damian Sójka @.> 发送时间: 2023年7月8日 19:36 收件人: zwbx/SHIFT-Continuous_Test_Time_Adaptation @.> 抄送: Subscribed @.***> 主题: [zwbx/SHIFT-Continuous_Test_Time_Adaptation] Results reproduction (Issue #4)

CAUTION: External email. Only click on links or open attachments from trusted senders.


Hi, I'm having issues reproducing the results presented in the README. This is what I'm getting in the console output at the end of the experiments::

+---------------+-------+-------+ | Class | IoU | Acc | +---------------+-------+-------+ | building | 83.52 | 93.14 | | fence | 0.0 | nan | | pedestrian | 31.4 | 38.04 | | pole | 23.16 | 29.04 | | road line | 66.14 | 69.65 | | road | 84.33 | 87.28 | | sidewalk | 55.27 | 58.87 | | vegetation | 28.04 | 64.15 | | vehicle | 75.55 | 91.11 | | wall | 0.0 | nan | | traffic sign | 28.68 | 31.66 | | sky | 0.0 | 0.0 | | traffic light | 0.0 | 0.0 | | terrain | 1.97 | 4.96 | +---------------+-------+-------+ Summary:

+--------+-------+-------+-------+ | Scope | mIoU | mAcc | aAcc | +--------+-------+-------+-------+ | global | 34.15 | 47.33 | 84.45 | +--------+-------+-------+-------+

+---------------+-------+-------+ | Class | IoU | Acc | +---------------+-------+-------+ | building | 81.89 | 98.58 | | fence | 0.0 | nan | | pedestrian | 29.38 | 32.81 | | pole | 15.35 | 16.33 | | road line | 67.69 | 70.1 | | road | 94.38 | 98.88 | | sidewalk | 65.69 | 70.01 | | vegetation | 19.99 | 21.22 | | vehicle | 80.53 | 86.84 | | wall | 0.0 | nan | | traffic sign | 12.71 | 12.87 | | sky | 0.0 | 0.0 | | traffic light | 0.0 | 0.0 | | terrain | 2.32 | 2.4 | +---------------+-------+-------+ Summary:

+--------+-------+------+-------+ | Scope | mIoU | mAcc | aAcc | +--------+-------+------+-------+ | global | 33.57 | 42.5 | 88.59 | +--------+-------+------+-------+

+---------------+-------+-------+ | Class | IoU | Acc | +---------------+-------+-------+ | building | 82.9 | 93.07 | | fence | 0.0 | nan | | pedestrian | 34.17 | 39.79 | | pole | 27.62 | 31.93 | | road line | 70.57 | 73.32 | | road | 82.77 | 85.04 | | sidewalk | 49.3 | 51.98 | | vegetation | 28.24 | 69.21 | | vehicle | 77.5 | 92.33 | | wall | 0.0 | nan | | traffic sign | 30.73 | 32.2 | | sky | 0.0 | 0.0 | | traffic light | 0.0 | 0.0 | | terrain | 1.47 | 4.1 | +---------------+-------+-------+ Summary:

+--------+-------+-------+-------+ | Scope | mIoU | mAcc | aAcc | +--------+-------+-------+-------+ | global | 34.66 | 47.75 | 83.82 | +--------+-------+-------+-------+

I have observed that the results are also appended to the file 'Test_on_videos_1x_val/<source_model | tent | cotta>/evaluation.txt.' In this file, the newly appended results are the same as the existing ones, except for insignificant deviations in 'cotta' probably due to its stochastic component. Should I assume that everything is correct and ignore the results from the console output?

I run experiments with the following command: export CUDA_VISIBLE_DEVICES=0 ; python tools/<test | tent | cotta>.py local_configs/shift_val_800x500.py work_dirs/deeplabv3_r50_shift_800x500/iter_40000.pth

The environment was set up according to the README. Packages:

Name Version Build Channel

_libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu attr 0.3.1 pypi_0 pypi backcall 0.2.0 pypi_0 pypi ca-certificates 2023.05.30 h06a4308_0 certifi 2023.5.7 pypi_0 pypi charset-normalizer 3.1.0 pypi_0 pypi click 8.1.3 pypi_0 pypi colorama 0.4.6 pypi_0 pypi decorator 5.1.0 pypi_0 pypi idna 3.4 pypi_0 pypi ipython 7.28.0 pypi_0 pypi jedi 0.18.0 pypi_0 pypi kornia 0.5.11 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 libffi 3.4.4 h6a678d5_0 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libstdcxx-ng 11.2.0 h1234567_1 markdown-it-py 2.2.0 pypi_0 pypi matplotlib-inline 0.1.3 pypi_0 pypi mmcv-full 1.2.7 pypi_0 pypi mmsegmentation 0.11.0 dev_0 ncurses 6.4 h6a678d5_0 opencv-python 4.5.1.48http://4.5.1.48 pypi_0 pypi openssl 3.0.9 h7f8727e_0 packaging 21.0 pypi_0 pypi parso 0.8.2 pypi_0 pypi pexpect 4.8.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pip 23.1.2 py38h06a4308_0 prompt-toolkit 3.0.20 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pygments 2.10.0 pypi_0 pypi python 3.8.16 h955ad1f_4 pyyaml 5.4.1 pypi_0 pypi readline 8.2 h5eee18b_0 requests 2.31.0 pypi_0 pypi rich 13.2.0 pypi_0 pypi setuptools 67.8.0 py38h06a4308_0 six 1.16.0 pypi_0 pypi sqlite 3.41.2 h5eee18b_0 terminaltables 3.1.10 pypi_0 pypi timm 0.3.2 pypi_0 pypi tk 8.6.12 h1ccaba5_0 torch 1.7.1+cu110 pypi_0 pypi torchvision 0.8.2+cu110 pypi_0 pypi traitlets 5.1.0 pypi_0 pypi wcwidth 0.2.5 pypi_0 pypi wheel 0.38.4 py38h06a4308_0 xz 5.4.2 h5eee18b_0 yapf 0.31.0 pypi_0 pypi zlib 1.2.13 h5eee18b_0

— Reply to this email directly, view it on GitHubhttps://github.com/zwbx/SHIFT-Continuous_Test_Time_Adaptation/issues/4, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJFDSL6ZPW67NQUOKXKN4WTXPEWJXANCNFSM6AAAAAA2CYCVLE. You are receiving this because you are subscribed to this thread.Message ID: @.***>

dmn-sjk commented 1 year ago

Thanks for the response. Seq in mmseg/datasets/shift.py:

if 'train' in img_dir:
    seq = 'data/shift/discrete/images/train/front/seq.csv'
else:
    seq = 'data/shift/continuous/videos/1x/val/front/seq.csv'

To clarify, because we might have misunderstood each other: I assumed that the results printed in the output at the end of TTA testing are the final results for the whole experiment. Now as I'm looking at it, it seems to me that those are just the results for the last sequence. Nevertheless, since I thought those are the final results, I expected them to look like the ones that are in the README. They are clearly different, so it seemed to me that there was a problem. I think I understand now, but I have one final question to be certain: Should I refer to the results in Test_on_videos_1x_val/<source_model | tent | cotta>/evaluation.txt if I want to check the average results from the entire experiment?

zwbx commented 1 year ago

yes, Test_on_videos_1x_val/<source_model | tent | cotta>/evaluation.txt contains the average results on all sequence from clear,daytime, which is for leadboard ranking. Notice that evaluation results we show on github is for all sequence regardless of start time.


发件人: Damian Sójka @.> 发送时间: 2023年7月10日 17:09 收件人: zwbx/SHIFT-Continuous_Test_Time_Adaptation @.> 抄送: Wenbo Zhang @.>; Comment @.> 主题: Re: [zwbx/SHIFT-Continuous_Test_Time_Adaptation] Results reproduction (Issue #4)

CAUTION: External email. Only click on links or open attachments from trusted senders.


Thanks for the response. Seq in mmseg/datasets/shift.py:

if 'train' in img_dir: seq = 'data/shift/discrete/images/train/front/seq.csv' else: seq = 'data/shift/continuous/videos/1x/val/front/seq.csv'

To clarify, because we might have misunderstood each other: I assumed that the results printed in the output at the end of TTA testing are the final results for the whole experiment. Now as I'm looking at it, it seems to me that those are just the results for the last sequence. Nevertheless, since I thought those are the final results, I expected them to look like the ones that are in the README. They are clearly different, so it seemed to me that there was a problem. I think I understand now, but I have one final question to be certain: Should I refer to the results in Test_on_videos_1x_val/<source_model | tent | cotta>/evaluation.txt if I want to check the average results from the entire experiment?

— Reply to this email directly, view it on GitHubhttps://github.com/zwbx/SHIFT-Continuous_Test_Time_Adaptation/issues/4#issuecomment-1628406355, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJFDSL2VVMZMDEZRPRSYURTXPOWRJANCNFSM6AAAAAA2CYCVLE. You are receiving this because you commented.Message ID: @.***>