karpathy / neuraltalk2

Efficient Image Captioning code in Torch, runs on GPU
5.49k stars 1.26k forks source link

problem in prepro.py: Can't broadcast (4, 256, 256) -> (1, 3, 256, 256) #186

Open Zergratino opened 7 years ago

Zergratino commented 7 years ago

When I do some preparing work for my own dataset, calling prepro.py will raise an error: Can't broadcast (4, 256, 256) -> (1, 3, 256, 256). It seems to be that in file prepro.py, it builds a dataset with shape (N,3,256,256), but somehow my image cannot be resized into that format. I use PNG type images, I don't know whether this type can cause this error or not. Does anyone encounter this kind of error?

My input json file is showed below.

[{"captions": ["a man with white tee is walking in front of a desk", "a man with white tee is walking by a chair", "a man is walking in front of a desk", "a man is walking in front of a desk", "a man with white tee is walking in a shop", "a man is walking in a shop"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/1503455811(1).png", "id": "1503455811(1).png"}, {"captions": ["a man with white tee is walking to stairs", "a man is walking to stairs", "a man with white tee is walking by a desk", "a man is walking by a desk", "a man with white tee is walking in a shop", "a man is walking in a shop"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/1503455837(1).png", "id": "1503455837(1).png"}, {"captions": ["a man in red tee is walking towards the machine", "a man is walking towards the machine", "a man in red tee is walking by a desk", "a man is walking by a desk", "a man in red tee is walking in shop", "a man is walking in shop"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/1503455887(1).png", "id": "1503455887(1).png"}, {"captions": ["a man in strips tee is walking by the machine", "a man is walking by the machine", "a man in strips tee is walking to the desk", "a man is walking to the desk", "a man in strips tee is walking in shop", "a man is walking in shop"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/1503455940(1).png", "id": "1503455940(1).png"}, {"captions": ["a man in red tee is walking upstairs", " a man is walking upstairs", "a man in red tee is holding a broom", "a man is holding a broom", "a man in red tee holding a broom is walking upstairs", "a man holding a broom is walking upstairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/1503455976(1).png", "id": "1503455976(1).png"}, {"captions": ["a man in strips tee taking a bag is walking into the shop", "a man taking a bag is walking into the shop", "a man in strips tee taking a bag is walking by a desk", "a man taking a bag is walking by a desk"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/1503456012(1).png", "id": "1503456012(1).png"}, {"captions": ["a man in strips tee taking a bag is walking towards stairs", "a man taking a bag is walking towards stairs", "a man in strips tee taking a bag is wandering", "a man in taking a bag is wandering"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/1503456041(1).png", "id": "1503456041(1).png"}, {"captions": ["a man in white tee and a man in red tee are walking to different directions", "two men are walking towards different directions", "man in white tee is walking to wall and man in red tee is walking to stairs", "a man is walking to wall and a man is walking to stairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out2.png", "id": "out2.png"}, {"captions": ["a man in white tee is leaving the store", "a man is leaving the store", "a man in white tee is walking out of the store", "a man is walking out of the store"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out6.png", "id": "out6.png"}, {"captions": ["man in red tee is sitting on a chair and one women in skirt and one man are walking out of the store", "a man in red tee is sitting on chair and two people are walking out of the store", "three people are in the store"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out9.png", "id": "out9.png"}, {"captions": ["a man in red tee is sitting on chair and three people are standing around him", "a man is sitting on chair and three people are standing around him", "a man in red tee is sitting on chair and one women is holding a cellphone"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out10 (2).png", "id": "out10 (2).png"}, {"captions": ["a man in red tee is walking towards a chair", "a man is walking towards a chair", "a man in red tee is walking to a desk", "a man is walking to a desk"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out10.png", "id": "out10.png"}, {"captions": ["a man in white tee is walking to desk", "a man in white is walking to desk", "a man in white tee is walking by chairs", "a man is walking by chairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out11.png", "id": "out11.png"}, {"captions": ["a man in white tee is walking towards chairs", "a man is walking towards chairs", "a man in white tee is walking by a desk", "a man is walking by desk"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out13.png", "id": "out13.png"}, {"captions": ["two men in white tee are standing by a desk", "two men are standing by a desk", "two men in white tee are standing in front of chairs", "two men are standing in front of chairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out17.png", "id": "out17.png"}, {"captions": ["a woman in yellow tee is walking to chairs", "a woman is walking to chairs", "a woman in yellow tee is walking downstairs", "a woman is walking downstairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out18.png", "id": "out18.png"}, {"captions": ["a woman with a backbag is talking with a woman holding clothes", "a woman with a backbag is talking with a woman in white tee", "two women are talking with each other", "two women are standing in front of desk"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out19.png", "id": "out19.png"}, {"captions": ["a man in white tee is walking towards the desk", "a man is walking towards the desk", "a man in white tee is walking"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out25.png", "id": "out25.png"}, {"captions": ["a man in black tee is walking out of the store", "a man is walking out of the store", "a man in black tee is walking by chairs", "a man in black is walking in front of desk"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out30.png", "id": "out30.png"}, {"captions": ["a man in strips tee is walking out of the store", "a man is walking out of the store"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out32.png", "id": "out32.png"}, {"captions": ["a man in white tee is walking out of the store", "a man is walking out of the store", "a man in white tee is walking by a chair"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out35.png", "id": "out35.png"}, {"captions": ["a man in black tee is walking to stairs", "a man in black is holding a bag", "a man holding a bag is walking to the stairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out43.png", "id": "out43.png"}, {"captions": ["a man in red tee is standing in front of the machine", "a man in red tee is looking", "a man holding a bag is standing in front of the machine"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out44.png", "id": "out44.png"}, {"captions": ["a man in white tee and a man in black tee is talking", "two man are talking on stairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out48 (2).png", "id": "out48 (2).png"}, {"captions": ["a man in red tee is standing by the desk and a man in white tee is walking out of store", "a man in red tee is standing by chairs and a man is walking out of the store"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out48.png", "id": "out48.png"}, {"captions": ["a man in black tee is walking to chairs", "a man in black is walking downstairs", "a man in black tee is walking to desk"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out50.png", "id": "out50.png"}, {"captions": ["a man in black tee is standing in front of desk", "a man in black is looking at desk", "a man in black is wandering"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out52.png", "id": "out52.png"}, {"captions": ["a man in white tee is standing by desk", "a man in white tee is standing on stairs", "a man in white tee is holding a cellphone"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out59.png", "id": "out59.png"}, {"captions": ["two men are talking", "two men are standing in front of desk", "a man in black tee and a man in white tee is talking"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out60.png", "id": "out60.png"}, {"captions": ["a man in black tee is standing in front of stairs", "a man in black tee is standing by chair", "a man in black tee is wandering"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out65.png", "id": "out65.png"}, {"captions": ["a man in red tee is walking by the machine", "a man in red tee is walking in front of the machine"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out66.png", "id": "out66.png"}, {"captions": ["a woman in white tee is looking at the machine", "a woman in white tee is standing in front of the machine", "a woman holding an umbrella is looking at the machine", "a woman holding an umbrella is standing in front of the machine"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out69.png", "id": "out69.png"}, {"captions": ["a man in black tee is walking of store and a woman with a backbag is standing in front of stairs", "a man in black tee is walking and a woman in white tee is standing"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out70.png", "id": "out70.png"}, {"captions": ["a man in red tee is standing by desk", "a man in red is standing in front of chair"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out77.png", "id": "out77.png"}, {"captions": ["a man in black tee is walking to stairs and a man is holding a cellphone", "a man in black is walking by chairs", "a man in black tee is holding a cellphone"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out81.png", "id": "out81.png"}, {"captions": ["two men are walking", "a man in red tee is holding a bag", "a man in blue tee is walking to stairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out82 (2).png", "id": "out82 (2).png"}, {"captions": ["one man in red tee is sitting on chair and one man in white tee is walking to stairs", "a man in red tee is holding a cellphone and one man in white tee is walking to stairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out82.png", "id": "out82.png"}, {"captions": ["a man in red tee is standing by a desk and a man in black tee is walking to stairs", "a man in red tee is standing by a chair and a man in black tee is walking to stairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out111.png", "id": "out111.png"}, {"captions": ["two men in red tee are sitting on chairs and a man in white tee is walking to stairs", "two men in red tee are talking and a man in white tee is walking to stairs", "two men in red tee sitting on chairs are talking and a man in white tee is walking to stairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out83.png", "id": "out83.png"}, {"captions": ["a man in black tee is walking to stairs", "a man in black tee is walking by chairs", "a man in black tee is walking by a desk"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out84.png", "id": "out84.png"}, {"captions": ["a man in black tee is walking out of the store", "a man in black tee is walking in front of the desk"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out92.png", "id": "out92.png"}, {"captions": ["a man in black tee is walking towards the stairs", "a man in black tee is walking to chairs", "a man in black tee is walking in front of chairs"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out99.png", "id": "out99.png"}, {"captions": ["a man in white is going downstairs", "a man in black is standing in the corner"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out126.png", "id": "out126.png"}, {"captions": ["a man in black is browsing the goods", "a man in black has a bag"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out134.png", "id": "out134.png"}, {"captions": ["a man in white is walking through", "a man in red is playing the phone"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out160.png", "id": "out160.png"}, {"captions": ["a man in white is walking through", "a man in white shirts and black pants"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out161.png", "id": "out161.png"}, {"captions": ["a man in black has a hat", "a man in black with his hands in the pockets", "a man in white with his back on the camera"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out162.png", "id": "out162.png"}, {"captions": ["a man in blue is talking to someone", "a man in white seems to answer the question", "a little girl in red"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out259.png", "id": "out259.png"}, {"captions": ["a man in red is walking through", "a man in white with someone in his hand"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out302.png", "id": "out302.png"}, {"captions": ["a man in black is walking through", "a man in white is looking back to someone"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out346.png", "id": "out346.png"}, {"captions": ["a man in red is sitting on the stool", "a stooping woman in black"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out259.png", "id": "out259.png"}, {"captions": ["a man in red is sitting on the stool", "a man in black is looking at the lens"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out366.png", "id": "out366.png"}, {"captions": ["a man in red is sitting on the stool", "a man in white is walkingthrough", "a man is in the corner"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out371.png", "id": "out371.png"}, {"captions": ["two men in red is walking through", "a man in white has something in his hand", "a man is sitting in the corner"], "file_path": "/home/ubuntu/neuraltalk2/test/walk/out415.png", "id": "out415.png"}]

TeisD commented 6 years ago

I had a similar problem with my own dataset.

Prepro.py presumes that the images loaded with scypi.imread have shape (n, m) (greyscale) or (n, m, 3) (rgb), although it could also be (n, m, 4) in case of RGBA. As you were using png's, it could be likely they had an alpha channel.

I was only using jpeg's, but imread loaded some of them as RGBA for some reason. I fixed it by forcing it to load all images as RGB.

diff --git a/prepro.py b/prepro.py
index aad11c3..3646a5d 100644
--- a/prepro.py
+++ b/prepro.py
@@ -182,7 +182,7 @@ def main(params):
   dset = f.create_dataset("images", (N,3,256,256), dtype='uint8') # space for resized images
   for i,img in enumerate(imgs):
     # load the image
-    I = imread(os.path.join(params['images_root'], img['file_path']))
+    I = imread(os.path.join(params['images_root'], img['file_path']), mode='RGB')
     try:
         Ir = imresize(I, (256,256))
     except:

It works for me, but didn't make a pull request as I'm not sure whether this is the best way to fix things. The original prepro.py contains a special function to handle greyscale images, so probably it would be better to make a similar function to handle RGBA images.