microsoft / arcticseals

A deep learning project in cooperation with the NOAA Marine Mammal Lab to detect & classify arctic seals in aerial imagery to understand how they’re adapting to a changing world.
MIT License
33 stars 14 forks source link

Thermal imagery normalization #4

Closed pbaer closed 6 years ago

pbaer commented 6 years ago

Our thermal data is in a raw 16-bit PNG format. We need to both a) figure out exactly what the raw values correspond to so that we can normalize the data (i.e. 0 degrees Celsius should be the same 16-bit pixel value in any image) for training and b) convert it an 8-bit representation for human visual inspection.

Marcel-Simon commented 6 years ago

It seems the reason why the 16 bit images are all gray is because the values are all pretty high, while still having very small variance.

In: np.unique(img, return_counts=True)
Out: (array([51241, 51361, 51381, 51401, 51421, 51441, 51461, 51481, 51501,
        51521, 51541, 51561, 51581, 51601, 51621, 51641, 51661, 51681,
        51701, 51721, 51741, 51761, 51781, 51801, 51821, 51841, 51861,
        51881, 51901, 51921, 51941, 51961, 51981, 52001, 52021, 52041,
        52061, 52101, 52181]),
 array([    1,     2,    13,     6,    31,    50,   204,   313,   880,
         1086,  3079,  6116, 11211, 12168, 27619, 25012, 45961, 33892,
        50997, 30580, 34756, 14925, 13214,  5228,  4758,  1819,  1709,
          762,   615,   323,   163,    80,    40,    35,    10,     8,
           10,     2,     2], dtype=int64))

However, the image reading itself seems fine. We just need proper normalization. For example, try this:

img = np.array(PIL.Image.open(image_path)).astype(np.float64)
mi = np.percentile(img, 1)
ma = np.percentile(img, 99)
plt.imshow((img - mi)/(ma-mi), vmin=0, vmax=1)

It is also important to set vmin=0, vmax=1 when using pyplot, as it will mess up scaling because there are negative and values >1 in the array after normalization. Also note that int is automatically converted to float during the devision as mi and ma are floats.

Also be careful when converting to 8 bit, because you will get unhandled overflows:

In: np.array([255,256,257]).astype(np.uint8)
Out: array([255,   0,   1], dtype=uint8)
jomalsan commented 6 years ago

The normalization that you have above will work for viewing the images, however I would not train on data that has been normalized in this way. Because the normalization takes the entire image into account, the portion of the image that contains a seal will be different numerically depending on if the image is primarily ice (low value) or liquid (high value)

Marcel-Simon commented 6 years ago

It's true that the distribution might be very different locally. It would be helpful to get some statistics as this issue certainly influences the architecture of recognition models. Let's make a possible action item out of this:

jomalsan commented 6 years ago

I am working on putting together a Jupyter notebook that explains what we currently know and then building a python script for other people to include

jomalsan commented 6 years ago

Thanks to Eric for his help on this project. The code is checked into "src/ir-normalization" and the normalized images have been added as a tar file to blob storage as "ArcticSealsTrain1807221152_N.tar". For the new normalized 8-bit thermal images, we have changed "16BIT" to "8BIT_N" in the file names to keep it straight.

pbaer commented 6 years ago

I'll generate a new training/test set with the full labeled IR imagery soon. Marcel, I could prepare VOTT annotations for animals only (as last time), or for both animals and anomalies, ie one or two classes. Thoughts on which is preferable?

Sent from Ninehttp://www.9folders.com/


From: Jon Malsan notifications@github.com Sent: Tuesday, July 24, 2018 10:08 PM To: Microsoft/arcticseals Cc: Peter Baer; Author Subject: Re: [Microsoft/arcticseals] Thermal imagery normalization (#4)

Thanks to Eric for his help on this project. The code is checked into "src/ir-normalization" and the normalized images have been added as a tar file to blob storage as "ArcticSealsTrain1807221152_N.tar". For the new normalized 8-bit thermal images, we have changed "16BIT" to "8BIT_N" in the file names to keep it straight.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2Farcticseals%2Fissues%2F4%23issuecomment-407634848&data=02%7C01%7Cpbaer%40exchange.microsoft.com%7C40bd89478f974ac57fde08d5f1eca778%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636680921078425082&sdata=tA6K0rKXtonPiipC42Pg%2BAVyVvUkxP38BZY%2BK8Ki8pU%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAHIdVHG5PpDeIAzgXQ5t30b-7tgOlyHsks5uJ_1JgaJpZM4VU--2&data=02%7C01%7Cpbaer%40exchange.microsoft.com%7C40bd89478f974ac57fde08d5f1eca778%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636680921078435092&sdata=pMeeWjx92FC3ihtrXXGyWSJmA6rCrCITCAPZWl8eXy8%3D&reserved=0.

Marcel-Simon commented 6 years ago

I think we only need the ones with animals inside. Could you also copy both the 16bit IR (or our 8bit normalized version) and the color version for each image? I'd run Neels code on it to get the combined images.


From: Peter Baer notifications@github.com Sent: Wednesday, July 25, 2018 9:09:58 AM To: Microsoft/arcticseals Cc: Marcel Simon; Comment Subject: Re: [Microsoft/arcticseals] Thermal imagery normalization (#4)

I'll generate a new training/test set with the full labeled IR imagery soon. Marcel, I could prepare VOTT annotations for animals only (as last time), or for both animals and anomalies, ie one or two classes. Thoughts on which is preferable?

Sent from Ninehttp://www.9folders.com/


From: Jon Malsan notifications@github.com Sent: Tuesday, July 24, 2018 10:08 PM To: Microsoft/arcticseals Cc: Peter Baer; Author Subject: Re: [Microsoft/arcticseals] Thermal imagery normalization (#4)

Thanks to Eric for his help on this project. The code is checked into "src/ir-normalization" and the normalized images have been added as a tar file to blob storage as "ArcticSealsTrain1807221152_N.tar". For the new normalized 8-bit thermal images, we have changed "16BIT" to "8BIT_N" in the file names to keep it straight.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2Farcticseals%2Fissues%2F4%23issuecomment-407634848&data=02%7C01%7Cpbaer%40exchange.microsoft.com%7C40bd89478f974ac57fde08d5f1eca778%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636680921078425082&sdata=tA6K0rKXtonPiipC42Pg%2BAVyVvUkxP38BZY%2BK8Ki8pU%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAHIdVHG5PpDeIAzgXQ5t30b-7tgOlyHsks5uJ_1JgaJpZM4VU--2&data=02%7C01%7Cpbaer%40exchange.microsoft.com%7C40bd89478f974ac57fde08d5f1eca778%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636680921078435092&sdata=pMeeWjx92FC3ihtrXXGyWSJmA6rCrCITCAPZWl8eXy8%3D&reserved=0.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2Farcticseals%2Fissues%2F4%23issuecomment-407809532&data=02%7C01%7Ct-masimo%40microsoft.com%7C7e5baff378704f389a8808d5f24911de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636681318002358102&sdata=5pXey9UlC68bZqBjuaCJES3uZswaZZkv20kHiec1fqo%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAne3nANRaT28XNE7csm0E9LUAdvHR8AGks5uKJhVgaJpZM4VU--2&data=02%7C01%7Ct-masimo%40microsoft.com%7C7e5baff378704f389a8808d5f24911de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636681318002358102&sdata=hfRHcZUuWqusme3ZE1%2FY2I07YGf9KHBQJAZxc1OhVrU%3D&reserved=0.

pbaer commented 6 years ago

Will do.

From: Marcel-Simon notifications@github.com Sent: Wednesday, July 25, 2018 9:14 AM To: Microsoft/arcticseals arcticseals@noreply.github.com Cc: Peter Baer pbaer@microsoft.com; Author author@noreply.github.com Subject: Re: [Microsoft/arcticseals] Thermal imagery normalization (#4)

I think we only need the ones with animals inside. Could you also copy both the 16bit IR (or our 8bit normalized version) and the color version for each image? I'd run Neels code on it to get the combined images.


From: Peter Baer notifications@github.com Sent: Wednesday, July 25, 2018 9:09:58 AM To: Microsoft/arcticseals Cc: Marcel Simon; Comment Subject: Re: [Microsoft/arcticseals] Thermal imagery normalization (#4)

I'll generate a new training/test set with the full labeled IR imagery soon. Marcel, I could prepare VOTT annotations for animals only (as last time), or for both animals and anomalies, ie one or two classes. Thoughts on which is preferable?

Sent from Ninehttp://www.9folders.com/


From: Jon Malsan notifications@github.com Sent: Tuesday, July 24, 2018 10:08 PM To: Microsoft/arcticseals Cc: Peter Baer; Author Subject: Re: [Microsoft/arcticseals] Thermal imagery normalization (#4)

Thanks to Eric for his help on this project. The code is checked into "src/ir-normalization" and the normalized images have been added as a tar file to blob storage as "ArcticSealsTrain1807221152_N.tar". For the new normalized 8-bit thermal images, we have changed "16BIT" to "8BIT_N" in the file names to keep it straight.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2Farcticseals%2Fissues%2F4%23issuecomment-407634848&data=02%7C01%7Cpbaer%40exchange.microsoft.com%7C40bd89478f974ac57fde08d5f1eca778%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636680921078425082&sdata=tA6K0rKXtonPiipC42Pg%2BAVyVvUkxP38BZY%2BK8Ki8pU%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAHIdVHG5PpDeIAzgXQ5t30b-7tgOlyHsks5uJ_1JgaJpZM4VU--2&data=02%7C01%7Cpbaer%40exchange.microsoft.com%7C40bd89478f974ac57fde08d5f1eca778%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636680921078435092&sdata=pMeeWjx92FC3ihtrXXGyWSJmA6rCrCITCAPZWl8eXy8%3D&reserved=0.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2Farcticseals%2Fissues%2F4%23issuecomment-407809532&data=02%7C01%7Ct-masimo%40microsoft.com%7C7e5baff378704f389a8808d5f24911de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636681318002358102&sdata=5pXey9UlC68bZqBjuaCJES3uZswaZZkv20kHiec1fqo%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAne3nANRaT28XNE7csm0E9LUAdvHR8AGks5uKJhVgaJpZM4VU--2&data=02%7C01%7Ct-masimo%40microsoft.com%7C7e5baff378704f389a8808d5f24911de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636681318002358102&sdata=hfRHcZUuWqusme3ZE1%2FY2I07YGf9KHBQJAZxc1OhVrU%3D&reserved=0.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2Farcticseals%2Fissues%2F4%23issuecomment-407810905&data=02%7C01%7Cpbaer%40exchange.microsoft.com%7Cda15d7c793bf4123fd1308d5f249a9fc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636681320553477970&sdata=E1ecuCy6h2Cu31ra4a2zQflZ6pqLYslEpQHAfwqVf%2Bw%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAHIdVMoFoVz_sHUOSybYEpKLEa4NxipOks5uKJlUgaJpZM4VU--2&data=02%7C01%7Cpbaer%40exchange.microsoft.com%7Cda15d7c793bf4123fd1308d5f249a9fc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636681320553477970&sdata=%2FlD1YLC9OU5TAIRCddLRqjz8MPrAhPVgpq0pvxY1H5c%3D&reserved=0.