lil-lab / nlvr

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
http://lic.nlp.cornell.edu/nlvr/
255 stars 59 forks source link

The images have 4 channels #2

Closed AashishV closed 7 years ago

AashishV commented 7 years ago

The given images have 4 colour channels and the last channel looks like this,

screenshot from 2017-07-14 13-31-06

Is there any particular reason for this?

alsuhr-c commented 7 years ago

What software are you using to view 4 channels? Is this CMYK? You should be able to interpret the images as 3-channel (RGB).

AashishV commented 7 years ago
from PIL import Image
import numpy as np

img_filename = '../../../Dataset/nlvr/train/images/1/train-1196-0-0.png'

img = Image.open(img_filename)
print(np.array(img).shape)

This gives an output as (100, 400, 4).

So, I am using the below snippet:

from PIL import Image
import numpy as np

img_filename = '../../../Dataset/nlvr/train/images/1/train-1196-0-0.png'

img = Image.open(img_filename).convert('RGB')
print(np.array(img).shape)

This gives me an output of (100, 400, 3) but I have to divide them by 255 to make the values lie in between 0 and 1.

alsuhr-c commented 7 years ago

I'm not very familiar with PIL, and in our code we use scipy imread (which calls PIL and gives us four channels -- we throw away the last channel). The last channel value is 255 for us, so I would guess this is some kind of alpha value. I'd suggest ignoring this channel because the values are all the same for each example. I'd also suggest dividing by 255. But you can investigate with PIL to see if it can do all of this by default for you. Does this answer your question?

AashishV commented 7 years ago

Yes, this does answer my question. Thank you for the quick reply.