syguan96 / DynaBOA

[T-PAMI 2022] Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation
225 stars 19 forks source link

Stuck while testing in dataloader #20

Closed ChawDoe closed 2 years ago

ChawDoe commented 2 years ago
sh run.sh 
100%|#####################################################################################################| 2018/2018 [00:00<00:00, 14689.63it/s]
alphapose-results Total Images: 0 , in fact: 2007
---> seed has been set
---> model and optimizer have been set
LEN: 2007
Adapt:   0%|                                                                                                  | 1/2007 [00:05<3:11:31,  5.73s/it]Adapt:   0%|                                                                                               | 1/2007 [03:50<128:29:03, 230.58s/it]
Traceback (most recent call last):
  File "dynaboa_internet.py", line 184, in <module>
    adaptor.excute()
  File "dynaboa_internet.py", line 76, in excute
    for step, batch in tqdm(enumerate(self.dataloader), total=len(self.dataloader), desc='Adapt'):
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data
    idx, data = self._get_data()
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1034, in _get_data
    success, data = self._try_get_data()
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 872, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/multiprocessing/queues.py", line 104, in get
    if not self._poll(timeout):
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/multiprocessing/connection.py", line 414, in _poll
    r = wait([self], timeout)
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/multiprocessing/connection.py", line 911, in wait
    ready = selector.select(timeout)
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/selectors.py", line 376, in select
    fd_event_list = self._poll.poll(timeout)
KeyboardInterrupt
^C
syguan96 commented 2 years ago

Hi, could you descript the problem in more detail?

ChawDoe commented 2 years ago

The program can run with the first 2 frames of the video. image It seems that program gets stuck without output while loading the third frame.

100%|#####################################################################################################| 2018/2018 [00:00<00:00, 13865.83it/s]
alphapose-results Total Images: 0 , in fact: 2007
---> seed has been set
---> model and optimizer have been set
LEN: 2007
Adapt:   0%|                                                                                                            | 0/2007 [00:00<?, ?it/s]0
Adapt:   0%|                                                                                                  | 1/2007 [00:06<3:38:29,  6.54s/it]1
Adapt:   0%|                                                                                                  | 2/2007 [00:11<3:00:03,  5.39s/it]

Adapt:   0%|                                                                                                 | 2/2007 [00:38<10:50:53, 19.48s/it]
Traceback (most recent call last):
  File "dynaboa_internet.py", line 185, in <module>
    adaptor.excute()
  File "dynaboa_internet.py", line 76, in excute
    for step, batch in tqdm(enumerate(self.dataloader), total=len(self.dataloader), desc='Adapt'):
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data
    idx, data = self._get_data()
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1034, in _get_data
    success, data = self._try_get_data()
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 872, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/multiprocessing/queues.py", line 104, in get
    if not self._poll(timeout):
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/multiprocessing/connection.py", line 414, in _poll
    r = wait([self], timeout)
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/multiprocessing/connection.py", line 911, in wait
    ready = selector.select(timeout)
  File "/mnt/zhoudeyu/conda/gl_conda2/envs/alphapose/lib/python3.6/selectors.py", line 376, in select
    fd_event_list = self._poll.poll(timeout)
KeyboardInterrupt
ChawDoe commented 2 years ago

could you share a docker environment?

syguan96 commented 2 years ago

I notice a strange point: alphapose-results Total Images: 0 , in fact: 2007. Could you please check the processed file?

ChawDoe commented 2 years ago

Yep. The output json looks like this: [{"image_id": "000001.png", "category_id": 1, "keypoints": [998.3497924804688, 511.17315673828125, 0.9436832070350647, 1060.5211181640625, 433.45904541015625, 0.9422322511672974, 920.6356811523438, 433.45904541015625, 0.9330096244812012, 1153.7779541015625, 433.45904541015625, 0.9037595987319946, 811.8359375, 449.0018615722656, 0.9082304239273071, 1324.7490234375, 806.4867553710938, 0.8380578756332397, 671.9505615234375, 868.6580200195312, 0.8347607254981995, 1464.6343994140625, 1086.2574462890625, 0.7451505064964294, 578.6936645507812, 1101.80029296875, 0.5830087661743164, 1309.2061767578125, 977.457763671875, 0.23413985967636108, 625.3220825195312, 837.5723876953125, 0.14807194471359253, 1239.263427734375, 1544.7706298828125, 0.1191876158118248, 866.23583984375, 1544.7706298828125, 0.1023932546377182, 1386.9202880859375, 930.8292846679688, 0.007646484766155481, 734.121826171875, 930.8292846679688, 0.008473267778754234, 1464.6343994140625, 1039.6290283203125, 0.026635972782969475, 734.121826171875, 961.9149169921875, 0.014611240476369858], "score": 2.5866856575012207, "idx": [0.0]}, {"image_id": "000002.png", "category_id": 1, "keypoints": [997.8942260742188, 507.96673583984375, 0.943446934223175, 1062.3236083984375, 427.4300842285156, 0.9428519010543823, 917.3576049804688, 427.4300842285156, 0.930939793586731, 1158.967529296875, 427.4300842285156, 0.9124693274497986, 804.6062622070312, 443.53741455078125, 0.8960825204849243, 1336.148193359375, 814.0059814453125, 0.8499772548675537, 675.7476806640625, 862.3279418945312, 0.838010311126709, 1465.0068359375, 1087.83056640625, 0.7524687051773071, 579.1036987304688, 1087.83056640625, 0.587560772895813, 1320.040771484375, 958.971923828125, 0.2198331207036972, 659.6403198242188, 846.2206420898438, 0.1381719559431076, 1247.557861328125, 1579.1041259765625, 0.22668635845184326, 860.98193359375, 1579.1041259765625, 0.20760270953178406, 1376.41650390625, 1579.1041259765625, 0.011060213670134544, 740.177001953125, 942.8646240234375, 0.009259197860956192, 1400.5775146484375, 926.7572631835938, 0.012580275535583496, 627.4256591796875, 1039.508544921875, 0.015579336322844028], "score": 2.541726589202881, "idx": [0.0]}, {"image_id": "000003.png", "category_id": 1, "keypoints": [996.3786010742188, 517.7439575195312, 0.9468082189559937, 1063.0037841796875, 437.7937316894531, 0.9399899244308472, 916.4283447265625, 437.7937316894531, 0.9449321627616882, 1156.279052734375, 437.7937316894531, 0.8784972429275513, 809.8280639648438, 451.1187744140625, 0.8947305679321289, 1329.50439453125, 797.5697021484375, 0.7847341299057007, 663.252685546875, 864.1948852539062, 0.758129358291626, 1476.079833984375, 1077.3953857421875, 0.6569521427154541, 569.9774780273438, 1077.3953857421875, 0.3141101896762848, 1642.642822265625, 1084.0579833984375, 0.06668268889188766, 623.277587890625, 784.24462890625, 0.14743362367153168, 1242.8917236328125, 1403.8587646484375, 0.020084384828805923, 843.1406860351562, 1403.8587646484375, 0.01595699042081833, 1382.8045654296875, 970.795166015625, 0.00978416483849287, 1129.62890625, 984.1201782226562, 0.006134824827313423, 1489.4049072265625, 984.1201782226562, 0.015890389680862427, 390.0894775390625, 1084.0579833984375, 0.016179770231246948], "score": 2.5139236450195312, "idx": [0.0]}, {"image_id": "000004.png", "category_id": 1, "keypoints": [996.4568481445312, 516.9083862304688, 0.948387086391449, 1063.2647705078125, 436.73883056640625, 0.9430480003356934, 916.287353515625, 436.73883056640625, 0.9467915296554565, 1156.7958984375, 436.73883056640625, 0.8826930522918701, 809.3945922851562, 450.10040283203125, 0.8900815844535828, 1330.49658203125, 797.5017700195312, 0.7718112468719482, 662.4171142578125, 850.9481201171875, 0.750632643699646, 1464.112548828125, 1078.0950927734375, 0.6389603018760681, 582.24755859375, 1078.0950927734375, 0.3383636474609375, 1250.3270263671875, 931.11767578125, 0.07017475366592407, 622.3323364257812, 784.1401977539062, 0.13094675540924072, 1216.923095703125, 1405.4541015625, 0.01809529773890972, 842.798583984375, 1405.4541015625, 0.014874284155666828, 1383.9429931640625, 971.2024536132812, 0.00977005623281002, 1130.07275390625, 984.5640258789062, 0.007842790335416794, 1490.835693359375, 997.9255981445312, 0.013686121441423893, 388.5045166015625, 1084.7760009765625, 0.02015593834221363], "score": 2.5267653465270996, "idx": [0.0]},

The directory looks like this image where the images folder contains every frame from the video.

ChawDoe commented 2 years ago

I modify the preprocess code to fit my video. I can't run your original code because you set the json file starts with 'seq', and my json file starts with alphapose.

image

ChawDoe commented 2 years ago

The total image is 0 because I set the wrong directory in the print function. So just ignore it. And I can run with few frames. But the program gets stuck with more frames. image

syguan96 commented 2 years ago

How about setting `num_workers=0'? https://github.com/syguan96/DynaBOA/blob/5b28dd48141af61e6288d84652e9aa65fbe2531a/base_adaptor.py#L150)

ChawDoe commented 2 years ago

Yes!! I set num_workers=1 and it can with more frames!

---> seed has been set
---> model and optimizer have been set
LEN: 9585
Adapt:   0%|                                                                                                            | 0/9585 [00:00<?, ?it/s]0
Adapt:   0%|                                                                                                 | 1/9585 [00:06<17:03:35,  6.41s/it]1
Adapt:   0%|                                                                                                 | 2/9585 [00:10<14:09:20,  5.32s/it]2
Adapt:   0%|                                                                                                 | 3/9585 [00:15<13:30:18,  5.07s/it]3
Adapt:   0%|                                                                                                 | 4/9585 [00:20<13:06:22,  4.92s/it]4
Adapt:   0%|                                                                                                 | 5/9585 [00:23<11:19:14,  4.25s/it]5
Adapt:   0%|                                                                                                 | 6/9585 [00:27<11:30:23,  4.32s/it]6
Adapt:   0%|                                                                                                 | 7/9585 [00:33<12:44:30,  4.79s/it]7
Adapt:   0%|                                                                                                 | 8/9585 [00:39<13:24:16,  5.04s/it]8
Adapt:   0%|                                                                                                 | 9/9585 [00:44<13:47:34,  5.19s/it]9
Adapt:   0%|1                                                                                               | 10/9585 [00:47<12:06:46,  4.55s/it]10
Adapt:   0%|1                                                                                               | 11/9585 [00:53<12:33:54,  4.72s/it]11
Adapt:   0%|1                                                                                               | 12/9585 [01:01<15:18:41,  5.76s/it]12
Adapt:   0%|1                                                                                               | 13/9585 [01:05<14:30:16,  5.46s/it]13
Adapt:   0%|1                                                                                               | 14/9585 [01:08<12:18:31,  4.63s/it]14
Adapt:   0%|1                                                                                               | 15/9585 [01:11<11:15:02,  4.23s/it]15
Adapt:   0%|1                                                                                               | 16/9585 [01:16<11:16:14,  4.24s/it]16
Adapt:   0%|1                                                                                               | 17/9585 [01:21<11:49:32,  4.45s/it]17
Adapt:   0%|1                                                                                               | 18/9585 [01:25<11:22:43,  4.28s/it]18
Adapt:   0%|1                                                                                               | 19/9585 [01:30<12:24:31,  4.67s/it]19
Adapt:   0%|2                                                                                               | 20/9585 [01:33<11:10:31,  4.21s/it]20
Adapt:   0%|2                                                                                               | 21/9585 [01:38<11:34:51,  4.36s/it]21
Adapt:   0%|2                                                                                               | 22/9585 [01:42<11:35:01,  4.36s/it]22
Adapt:   0%|2                                                                                               | 23/9585 [01:46<11:24:28,  4.29s/it]23
Adapt:   0%|2                                                                                               | 24/9585 [01:51<11:16:23,  4.24s/it]24
Adapt:   0%|2                                                                                               | 25/9585 [01:53<10:06:36,  3.81s/it]25
Adapt:   0%|2                                                                                                | 26/9585 [01:56<9:03:14,  3.41s/it]

But it is still very slow. Do you have any method to speed it up?

ChawDoe commented 2 years ago

Okay. The default batch size is 1... I will try large batch. Thanks for your patient reply.

ChawDoe commented 2 years ago

Okay. The default batch size is 1... I will try large batch. Thanks for your patient reply.

When I try a larger batch, another error occurs.

Traceback (most recent call last):
  File "dynaboa_internet.py", line 185, in <module>
    adaptor.excute()
  File "dynaboa_internet.py", line 87, in excute
    self.inference(batch, self.model)
  File "dynaboa_internet.py", line 170, in inference
    self.save_results(pred_vertices, pred_cam, image, batch['imgname'], batch['bbox'], prefix='Pred')
  File "/mnt/zhoudeyu/project/save_video/dengyuanzhang/dynaboa/base_adaptor.py", line 453, in save_results
    bbox = bbox.cpu().numpy()
AttributeError: 'numpy.ndarray' object has no attribute 'cpu'
syguan96 commented 2 years ago

Okay. The default batch size is 1... I will try large batch. Thanks for your patient reply.

Since we focus on the streaming scenario, the default batchsize should be 1. If we increase the batchsize, the short-term temporal loss should be removed or changed to adapt the corner case.

ykgod7 commented 2 years ago

Can you please teach me how you fixed the original code??

when I run bash run_on_internet.sh I get ValueError: need at least one array to concatenate

syguan96 commented 2 years ago

Hi @ykgod7. Could you please give me the complete error information?

ykgod7 commented 2 years ago

This is what I got

---> seed has been set ---> model and optimizer have been set 0 Traceback (most recent call last): File "dynaboa_internet.py", line 182, in <module> adaptor = Adaptor(options) File "/home/training/James/3d_folder/DynaBOA/base_adaptor.py", line 63, in __init__ self.set_dataloader() File "/home/training/James/3d_folder/DynaBOA/base_adaptor.py", line 149, in set_dataloader dataset = Internet_dataset() File "/home/training/James/3d_folder/DynaBOA/boa_dataset/internet_data.py", line 42, in __init__ self.imgnames = np.concatenate(self.imgnames, 0) File "<__array_function__ internals>", line 6, in concatenate ValueError: need at least one array to concatenate

syguan96 commented 2 years ago

Make sure the path is correct.

ykgod7 commented 2 years ago

img1

Sorry, i'm new to this deep learning. Can you please tell me which file that I have to correct??

syguan96 commented 2 years ago

Nevermind. You can set ipdb before this line https://github.com/syguan96/DynaBOA/blob/622e4e27bb31e084e7657fee5eebf788d0b102b3/boa_dataset/internet_data.py#L30

Then you can check the value of data['imgname']. Whether there are images corresponding to the recorded path?

syguan96 commented 2 years ago

insert import ipdb;ipdb.set_trace() before Line 30.

ykgod7 commented 2 years ago

img2

Oh... datanames variable didn't have anything in it. datanames = glob.glob(osp.join(config.InternetData_ROOT, 'seq*.npz')) For this line, where can I get 'seq*.npz' these files?? I do not have them on my internetData images folder.

syguan96 commented 2 years ago

image You should run the processing script. Also, make sure seq*.npz are placed intoInternetData_ROOT

ykgod7 commented 2 years ago

Thank you, I got the .npz!

Traceback (most recent call last): File "dynaboa_internet.py", line 183, in <module> adaptor.excute() File "dynaboa_internet.py", line 76, in excute for step, batch in tqdm(enumerate(self.dataloader), total=len(self.dataloader), desc='Adapt'): File "/home/training/anaconda3/envs/DynaBOA-env/lib/python3.6/site-packages/tqdm/std.py", line 1166, in __iter__ for obj in iterable: File "/home/training/anaconda3/envs/DynaBOA-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in __next__ data = self._next_data() File "/home/training/anaconda3/envs/DynaBOA-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/home/training/anaconda3/envs/DynaBOA-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/home/training/anaconda3/envs/DynaBOA-env/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/training/anaconda3/envs/DynaBOA-env/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/home/training/anaconda3/envs/DynaBOA-env/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/training/anaconda3/envs/DynaBOA-env/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/training/James/3d_folder/DynaBOA/boa_dataset/internet_data.py", line 51, in __getitem__ image = self.read_image(imgname) File "/home/training/James/3d_folder/DynaBOA/boa_dataset/internet_data.py", line 68, in read_image img = cv2.imread(imgname)[:,:,::-1].copy().astype(np.float32) TypeError: 'NoneType' object is not subscriptable

I'm really sorry, but can you help me with this?

syguan96 commented 2 years ago

Same way, set ipdb before Line 68 of internet_data.py to check theimgname