Open hdjsjyl opened 5 years ago
@hdjsjyl I can't run this code, I think maybe my citypersons dataset is not right. could you give some advice or link to download this dataset? Thank you very much!
@hdjsjyl I download images from https://www.cityscapes-dataset.com/file-handling/?packageID=28,but it's only 2.2MB. Is it right?
Hi, I have a similar problem and I would appreciate if you could kindly provide some advice. I trained a model using train_city with one gpu and learning rate of 2e-4 but I could get ~12%. Should I decrease the learning rate in this scenario to be able to reproduce the result?
I think it might because of the number of GPU and the batchsize. I use one gpu with batchsize 2, the MR is 13%.
@liuwei16 @frezaeix @msha096 Hi, could you give me links to download cityperson datasets? I try many times, but always is wrong. I try to download images from: leftImg8bit_trainvaltest.zip (11GB) gtBbox_cityPersons_trainval.zip (2.2MB) I try to download annotations from: shanshanzhang/source/annotations but no help. when I run the code, one of Errors is: ('get_batch_gt:', AttributeError("'NoneType' object has no attribute 'shape'",))
@Invokar , sorry about the late reply. From the error: it means your code maybe "cv2.imread()" does read the correct the image. So first, check the path of images; second, check your dataLoader(). You can output the image path, and use cv2.imshow('image', image) to check whether you read the image correctly.
@hdjsjyl Thank you for your reply. I do it, and I find the path of image is not in my download datasets, I'm really confused. one of path is data/cityperson/images/train/hamburg/hamburg_000000_104857_leftImg8bit.png , and it not in my dataset. And If the code is right, the only problem I think should be datasets
@Invokar I think you get correct dataset with 5000 images of 11GB. Maybe you need to check the path where you put your dataset.
@hdjsjyl Thank you very much, I made a simple mistake, I named cityperson as citypersons. oh my god..
I have run the original code, and get 12 MR. I also implement the code in PyTorch, and get 13 MR. Both batch size is 8. (4 x 2GPUs). @hdjsjyl , If you find a way to reproduce the result, please informs me, thanks!
I have run the original code, and get 12 MR. I also implement the code in PyTorch, and get 13 MR. Both batch size is 8. (4 x 2GPUs). @hdjsjyl , If you find a way to reproduce the result, please informs me, thanks!
Hi, what GPU are you using? I am using 1080Ti and for my gpu it can only handles two image. Did your GPU handle 4 images? Is there any trick to improve the batch size for each GPU?
nvidia P40 has 24GB mem. And in pytorch, 4 images per batch only cover about 9~10GB mem. I don't like keras by the way. It hides too many details.
nvidia P40 has 24GB mem. And in pytorch, 4 images per batch only cover about 9~10GB mem. I don't like keras by the way. It hides too many details.
Is it faster in pytorch? Can you give some advice to re-implement in pytorch please? Thanks!
nvidia P40 has 24GB mem. And in pytorch, 4 images per batch only cover about 9~10GB mem. I don't like keras by the way. It hides too many details.
Is it faster in pytorch? Can you give some advice to re-implement in pytorch please? Thanks! Here is my code in pytorch. https://github.com/lw396285v/CSP-pedestrian-detection-in-pytorch
The codes may contain some mistakes, currently just me is working on it. I hope other people could give some advice to reproduce the result.
nvidia P40 has 24GB mem. And in pytorch, 4 images per batch only cover about 9~10GB mem. I don't like keras by the way. It hides too many details.
Is it faster in pytorch? Can you give some advice to re-implement in pytorch please? Thanks! Here is my code in pytorch. https://github.com/lw396285v/CSP-pedestrian-detection-in-pytorch
Thanks for the sharing!
Actually there is one thing confusing me so much. usually in object detection, we will freeze the batchnorm layers and the first block in ResNet if pretrained weight is used. But in CSP, only convolutional layers in the first block are frozen, it is quite strange.
@lw396285v Because CSP use a large batchsize, updating BN can improve the performance in SNIPER: Efficient Multi-Scale Training.
@lw396285v Because CSP use a large batchsize, updating BN can improve the performance in SNIPER: Efficient Multi-Scale Training.
Thanks for the reply. But you may make some mistakes, I suggest you to read the SNIPER source code. I have read the paper. Actually SNIPER freeze the conv0 and bn0 and layer1. other batchnorm layers are not frozen. In common practice, conv0 layer1 and all the batchnorm are fixed due to small batchsize.
The wired thing in the code is, he fixed all the layers in LAYER1 except batchnorm in LAYER1. he fixed conv0, but bn0 is trainable.
I got similar results when I used 4 x 2GPUs. And my another friend told me when he runs the code with 2 x 4 GPUs, he can get similar results as what the author reported.
On Sat, May 25, 2019 at 1:22 PM Li wen notifications@github.com wrote:
I have run the original code, and get 12 MR. I also implement the code in PyTorch, and get 13 MR. Both batch size is 8. (4 x 2GPUs). @hdjsjyl https://github.com/hdjsjyl , If you find a way to reproduce the result, please informs me, thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/liuwei16/CSP/issues/14?email_source=notifications&email_token=ABZ5B62PVSKMEMZZUGPEYFTPXF7UVA5CNFSM4HLCQKK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWHWX3I#issuecomment-495938541, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZ5B63TE4BNPB7VP35TS4TPXF7UVANCNFSM4HLCQKKQ .
-- Best wishes! Lei Shi
@lw396285v Because CSP use a large batchsize, updating BN can improve the performance in SNIPER: Efficient Multi-Scale Training.
Thanks for the reply. But you may make some mistakes, I suggest you to read the SNIPER source code. I have read the paper. Actually SNIPER freeze the conv0 and bn0 and layer1. other batchnorm layers are not frozen. In common practice, conv0 layer1 and all the batchnorm are fixed due to small batchsize.
The wired thing in the code is, he fixed all the layers in LAYER1 except batchnorm in LAYER1. he fixed conv0, but bn0 is trainable.
Have you compared the final results under different settings?
Does anyone know why the training is unstable? The results seem to vary a lot between epochs or every trail.
@Invokar can u send me the cityperson dataset?l can not download it 17326060548@163.com,thank u
@Invokar can u send me the cityperson dataset?l can not download it 17326060548@163.com,thank u
See my early reply。 Datasets is leftImg8bit_trainvaltest.zip (11GB)
@Invokar I know what you said, but I have registered and I have not received the confirmed email address, so I can't log in anymore, and I can't download it. Also, I see that the data set he uses is 2079 and 500. Why is the data set not used so big?
@Invokar And he has given the train_h50 cache file, is it just that I just run train_city.py?thank u
@hdjsjyl @Invokar @frezaeix @msha096 Can you give me some advice? I can't solve the following questions. I am very grateful for your suggestions. When I execute the train_city.py, the terminal renders like this. Is it insufficient memory? How to solve it? How to adjust the batchsize? name: GeForce MX130 major: 5 minor: 0 memoryClockRate(GHz): 1.189
**2019-07-18 17:08:02.016935: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 179 Chunks of size 512 totalling 89.5KiB 2019-07-18 17:08:02.016941: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 375 Chunks of size 1024 totalling 375.0KiB 2019-07-18 17:08:02.016947: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 1280 totalling 1.2KiB 2019-07-18 17:08:02.016952: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 1792 totalling 3.5KiB 2019-07-18 17:08:02.016958: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 219 Chunks of size 2048 totalling 438.0KiB 2019-07-18 17:08:02.016977: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 138 Chunks of size 4096 totalling 552.0KiB 2019-07-18 17:08:02.016982: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 5120 totalling 5.0KiB 2019-07-18 17:08:02.016987: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 73 Chunks of size 8192 totalling 584.0KiB 2019-07-18 17:08:02.016992: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 16384 totalling 32.0KiB 2019-07-18 17:08:02.016997: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 37632 totalling 73.5KiB 2019-07-18 17:08:02.017002: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 12 Chunks of size 65536 totalling 768.0KiB 2019-07-18 17:08:02.017007: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 7 Chunks of size 131072 totalling 896.0KiB 2019-07-18 17:08:02.017012: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 6 Chunks of size 147456 totalling 864.0KiB 2019-07-18 17:08:02.017017: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 43 Chunks of size 262144 totalling 10.75MiB 2019-07-18 17:08:02.017023: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 327680 totalling 320.0KiB 2019-07-18 17:08:02.017027: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 14 Chunks of size 524288 totalling 7.00MiB 2019-07-18 17:08:02.017033: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 21 Chunks of size 589824 totalling 11.81MiB 2019-07-18 17:08:02.017038: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 655360 totalling 640.0KiB 2019-07-18 17:08:02.017043: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 726528 totalling 709.5KiB 2019-07-18 17:08:02.017048: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 58 Chunks of size 1048576 totalling 58.00MiB 2019-07-18 17:08:02.017053: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 1131264 totalling 1.08MiB 2019-07-18 17:08:02.017058: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 5 Chunks of size 1638400 totalling 7.81MiB 2019-07-18 17:08:02.017063: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 1982464 totalling 1.89MiB 2019-07-18 17:08:02.017067: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 11 Chunks of size 2097152 totalling 22.00MiB 2019-07-18 17:08:02.017072: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 30 Chunks of size 2359296 totalling 67.50MiB 2019-07-18 17:08:02.017077: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 2752512 totalling 2.62MiB 2019-07-18 17:08:02.017082: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 3145728 totalling 3.00MiB 2019-07-18 17:08:02.017087: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 3276288 totalling 3.12MiB 2019-07-18 17:08:02.017092: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 3276800 totalling 3.12MiB 2019-07-18 17:08:02.017097: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 23 Chunks of size 4194304 totalling 92.00MiB 2019-07-18 17:08:02.017102: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 4915200 totalling 4.69MiB 2019-07-18 17:08:02.017107: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 6 Chunks of size 7077888 totalling 40.50MiB 2019-07-18 17:08:02.017112: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 11 Chunks of size 8388608 totalling 88.00MiB 2019-07-18 17:08:02.017117: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 12 Chunks of size 9437184 totalling 108.00MiB 2019-07-18 17:08:02.017122: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 12582912 totalling 12.00MiB 2019-07-18 17:08:02.017127: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 6 Chunks of size 16777216 totalling 96.00MiB 2019-07-18 17:08:02.017132: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 19938304 totalling 38.03MiB 2019-07-18 17:08:02.017137: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 6 Chunks of size 33554432 totalling 192.00MiB 2019-07-18 17:08:02.017142: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 58704896 totalling 55.99MiB 2019-07-18 17:08:02.017147: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 104857600 totalling 200.00MiB 2019-07-18 17:08:02.017152: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 181840128 totalling 173.42MiB 2019-07-18 17:08:02.017157: I tensorflow/core/common_runtime/bfc_allocator.cc:684] Sum Total of in-use chunks: 1.28GiB 2019-07-18 17:08:02.017164: I tensorflow/core/common_runtime/bfc_allocator.cc:686] Stats: Limit: 1370292224 InUse: 1370291712 MaxInUse: 1370292224 NumAllocs: 4565 MaxAllocSize: 181840128
2019-07-18 17:08:02.017242: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *****x*xxxxx 2019-07-18 17:08:02.017263: W tensorflow/core/framework/op_kernel.cc:1198] Resource exhausted: OOM when allocating tensor with shape[1,1,256,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc **
@hdjsjyl @Invokar @frezaeix @msha096 Can you give me some advice? I can't solve the following questions. I am very grateful for your suggestions. When I execute the train_city.py, the terminal renders like this. Is it insufficient memory? How to solve it? How to adjust the batchsize? name: GeForce MX130 major: 5 minor: 0 memoryClockRate(GHz): 1.189
**2019-07-18 17:08:02.016935: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 179 Chunks of size 512 totalling 89.5KiB 2019-07-18 17:08:02.016941: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 375 Chunks of size 1024 totalling 375.0KiB 2019-07-18 17:08:02.016947: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 1280 totalling 1.2KiB 2019-07-18 17:08:02.016952: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 1792 totalling 3.5KiB 2019-07-18 17:08:02.016958: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 219 Chunks of size 2048 totalling 438.0KiB 2019-07-18 17:08:02.016977: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 138 Chunks of size 4096 totalling 552.0KiB 2019-07-18 17:08:02.016982: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 5120 totalling 5.0KiB 2019-07-18 17:08:02.016987: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 73 Chunks of size 8192 totalling 584.0KiB 2019-07-18 17:08:02.016992: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 16384 totalling 32.0KiB 2019-07-18 17:08:02.016997: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 37632 totalling 73.5KiB 2019-07-18 17:08:02.017002: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 12 Chunks of size 65536 totalling 768.0KiB 2019-07-18 17:08:02.017007: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 7 Chunks of size 131072 totalling 896.0KiB 2019-07-18 17:08:02.017012: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 6 Chunks of size 147456 totalling 864.0KiB 2019-07-18 17:08:02.017017: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 43 Chunks of size 262144 totalling 10.75MiB 2019-07-18 17:08:02.017023: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 327680 totalling 320.0KiB 2019-07-18 17:08:02.017027: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 14 Chunks of size 524288 totalling 7.00MiB 2019-07-18 17:08:02.017033: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 21 Chunks of size 589824 totalling 11.81MiB 2019-07-18 17:08:02.017038: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 655360 totalling 640.0KiB 2019-07-18 17:08:02.017043: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 726528 totalling 709.5KiB 2019-07-18 17:08:02.017048: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 58 Chunks of size 1048576 totalling 58.00MiB 2019-07-18 17:08:02.017053: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 1131264 totalling 1.08MiB 2019-07-18 17:08:02.017058: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 5 Chunks of size 1638400 totalling 7.81MiB 2019-07-18 17:08:02.017063: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 1982464 totalling 1.89MiB 2019-07-18 17:08:02.017067: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 11 Chunks of size 2097152 totalling 22.00MiB 2019-07-18 17:08:02.017072: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 30 Chunks of size 2359296 totalling 67.50MiB 2019-07-18 17:08:02.017077: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 2752512 totalling 2.62MiB 2019-07-18 17:08:02.017082: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 3145728 totalling 3.00MiB 2019-07-18 17:08:02.017087: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 3276288 totalling 3.12MiB 2019-07-18 17:08:02.017092: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 3276800 totalling 3.12MiB 2019-07-18 17:08:02.017097: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 23 Chunks of size 4194304 totalling 92.00MiB 2019-07-18 17:08:02.017102: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 4915200 totalling 4.69MiB 2019-07-18 17:08:02.017107: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 6 Chunks of size 7077888 totalling 40.50MiB 2019-07-18 17:08:02.017112: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 11 Chunks of size 8388608 totalling 88.00MiB 2019-07-18 17:08:02.017117: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 12 Chunks of size 9437184 totalling 108.00MiB 2019-07-18 17:08:02.017122: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 12582912 totalling 12.00MiB 2019-07-18 17:08:02.017127: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 6 Chunks of size 16777216 totalling 96.00MiB 2019-07-18 17:08:02.017132: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 19938304 totalling 38.03MiB 2019-07-18 17:08:02.017137: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 6 Chunks of size 33554432 totalling 192.00MiB 2019-07-18 17:08:02.017142: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 58704896 totalling 55.99MiB 2019-07-18 17:08:02.017147: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 2 Chunks of size 104857600 totalling 200.00MiB 2019-07-18 17:08:02.017152: I tensorflow/core/common_runtime/bfc_allocator.cc:680] 1 Chunks of size 181840128 totalling 173.42MiB 2019-07-18 17:08:02.017157: I tensorflow/core/common_runtime/bfc_allocator.cc:684] Sum Total of in-use chunks: 1.28GiB 2019-07-18 17:08:02.017164: I tensorflow/core/common_runtime/bfc_allocator.cc:686] Stats: Limit: 1370292224 InUse: 1370291712 MaxInUse: 1370292224 NumAllocs: 4565 MaxAllocSize: 181840128
2019-07-18 17:08:02.017242: W tensorflow/core/common_runtime/bfcallocator.cc:277] ********x****_xxxxx 2019-07-18 17:08:02.017263: W tensorflow/core/framework/op_kernel.cc:1198] Resource exhausted: OOM when allocating tensor with shape[1,1,256,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc **
I had the same problem.Have you solved your problem?How to solve it?
your gpu is way to small and weak. forget using it. you can use http://colab.research.google.com, set C.gpu_ids to '0' and maybe reduce the input image resolution (C.size_train)
The optimal performance of my pedestrian detector(STNet will release) on cityPersons is 8% MR,and only need one GPU for train~
@Stinky-Tofu I saw your method gets very promising results. I am looking forward to your release.
The optimal performance of my pedestrian detector(STNet will release) on cityPersons is 8% MR,and only need one GPU for train~
Hello,i want to know how to submit the result of the test set to CityPersons dataset? Thank you!
2GPU 4images/GPU does not equal to 4GPU 2images/GPU
@hdjsjyl @qilong-zhang @frezaeix @msha096 @lw396285v I have worked on it. Maybe they will help:https://arxiv.org/abs/2002.09053, https://github.com/WangWenhao0716/Adapted-Center-and-Scale-Prediction
@hdjsjyl @qilong-zhang @frezaeix @msha096 @lw396285v I have worked on it. Maybe they will help:https://arxiv.org/abs/2002.09053, https://github.com/WangWenhao0716/Adapted-Center-and-Scale-Prediction
Could you please share with me the annotation file .mat of citypersons data set ,775980299@qq.com
@liuwei16 Thanks for sharing your excellent work. But, when I use two gpu with batch size of 4 images, learning rate with 2e-4 or 4e-4, I can not reproduce the paper results. Can you provide some advice? Any advice will be appreciated, thanks