Closed harshithdwivedi closed 4 years ago
I tried using dcraw_make_mem_image
method outlined in the docs, but it returns a black image file instead.
for filepath in pathlib.Path("images").glob('**/*'):
path = r"{}".format(filepath.resolve())
with rawpy.imread(path) as raw:
rgb = raw.dcraw_make_mem_image()
cv2.imwrite("{}.jpg".format(filepath.stem), rgb)
I intentially did not expose this function in rawpy as the scope of rawpy should only be on raw image processing. The thumbnail is stored in the EXIF data, and EXIF is a common format for many image formats and for that reason there are libraries which specifically deal with EXIF data. You should be able to use https://pypi.org/project/ExifRead/ for extracting the thumbnail.
ExifRead doesn't support this. There's an issue that has been open since 2012 about the exact same thing with no progress: https://github.com/ianare/exif-py/issues/8
@letmaik I believe that any operation concerning raw files should be a part of rawpy. Moreover, since the relevant method I'm talking about already exists in libRaw, I don't think it would be a challenge to do, what do you think?
I just tried it on a CR2 file and exifread did find the thumbnail. If there are multiple thumbnails for one image format (JPEG) then it will stop at the first one, but libraw would do the same. Are you sure exifread doesn't work for you? Can you upload a sample CR2 file?
Thanks for the quick revert. This is the image I was having issues with: https://drive.google.com/file/d/1HcwnQPexLkolKALWiJXKGHzsKVu_KEI8/view?usp=sharing
ExifRead can tell me that the thumbnail exists, but it fails to retrieve it for me. Thanks again, really appreciate it!
It works for me:
import exifread
with open("JT2A4776 (2019_09_17 07_15_58 UTC).CR2", 'rb') as f:
tags = exifread.process_file(f)
with open("JT2A4776 (2019_09_17 07_15_58 UTC).jpg", 'wb') as f:
f.write(tags['JPEGThumbnail'])
This seems to work, but as compared to dcraw, there's a significant loss in the quality of the extracted thumbnail. For instance, here's a size comparison of the image extracted by dcraw vs the code above:
Both, the code snippet above and using dcraw cli take the same amount of time to generate this thumbnail.
Thanks again, really appreciate your help; especially on a weekend!
Interesting, so there seem to be multiple embedded JPEGs. I used exiftool -b -PreviewImage ...
to extract the bigger JPEG. This is not actually a thumbnail as it has the same resolution as the raw image. My main issue with exposing this feature from libraw is that it doesn't seem to solve the problem fully, meaning it doesn't return all the embedded JPEGs, just one of them, maybe the biggest? And I can guarantee you that once I add this feature someone will ask for a feature to extract the smaller JPEG (the actual thumbnail), which I then couldn't do. I still believe this belongs into an EXIF library as the problem can be solved there in a much more principled way and without duplication of efforts.
Some more details about your image dumped from exiftool:
Preview Image Start : 70472
Preview Image Length : 2117606
Preview Image : (Binary data 2117606 bytes, use -b option to extract)
...
Thumbnail Image Valid Area : 0 159 7 112
Thumbnail Offset : 54880
Thumbnail Length : 15586
Thumbnail Image : (Binary data 15586 bytes, use -b option to extract)
So looks like those two types of images are a thing. And libraw's unpack_thumb is returning the preview image, whereas it should return the thumbnail image. And then there should probably be another function unpack_preview to get the other one.
Hm, interesting.
And I can guarantee you that once I add this feature someone will ask for a feature to extract the smaller JPEG (the actual thumbnail), which I then couldn't do.
Would it be possible to just mirror what libraw does?
I looked upon the C++ api of Libraw and it mentions that the extract_thumb()
function (which essentially mirrors dcraw -e) extracts the preview thumbnail.
https://www.libraw.org/docs/API-CXX.html#unpack_thumb
So it seems like when talking about thumbnails with libraw, the developers mean the Preview Thumbnails. which makes sense since the JPEGThumbnail which we extracted is only 160x140 and is barely usable for any practical purposes.
Let me know what you think.
Hi @letmaik any thoughts?
There is a difference between thumbnail image and preview image as I mentioned above. libraw seems to extract the preview image but has named the function "thumb". libraw does not support extracting the thumbnail image. If this should go into rawpy then I'd like to use the correct semantics. But to do that I first would like to make sure that libraw is actually always consistently extracting the preview image. If there's no preview image in a raw file but only a thumbnail image then libraw shouldn't extract anything. If it does extract the thumbnail in that case then life gets harder... Could you find this out, maybe by posting in the libraw forums?
I still think this should be implemented in exifread (or similar), but I recognize the benefit of adding it to rawpy for now as it solves the problem with little effort and it looks like exifread development is not very active.
I just tried running libraw's mem_image executable on a raw image captured on my Samsung S10+ (which doesn't have a thumbnail).
Upon running the following code: mem_image.exe -e "20200115_200829.dng"
, I don't get any thumb file, only a ppm file whereas when I run this command on the CR2 file I linked above, I get both a jpeg thumb file and a ppm file.
However, I have posted this question on the forums and I'll let you know as soon as I hear back from someone there. https://www.libraw.org/forum/19#new (the post is still being reviewed at the time of writing this comment)
but I recognize the benefit of adding it to rawpy for now as it solves the problem with little effort and it looks like exifread development is not very active.
That'd be great, thanks a ton! I had some thoughts about the python API. If the JPEG Preview is missing, is it possible for rawpy to throw an error instead of silently failing? That way, I can fallback to converting the raw to a jpeg/png instead.
Thanks again, really appreciate your help!
P.S. I also stumbled upon a new method in the libraw api: dcraw_make_mem_thumb
It's said to return the JPEG thumbnail 'as is'. Maybe it returns the thumbnails instead of the Preview which is returned by unpack_thumb
?
I don't have my machine setup to build libraw, so I can't run all the methods to test what they return, sorry about that!
To use dcraw_make_mem_thumb
you have to use unpack_thumb
first, it's the same data. It's just a flat representation of the data. In the case of bitmap data it actually doesn't contain the full "file" (headers are missing) but for JPEG it's a regular JPEG file so you can just save it to disk. It's all a bit weird. I would probably only support JPEG at first.
I see. Regular jpeg support for now would be more than enough.
Hi @letmaik there's an update on the question that I posted on LibRaw forums. https://www.libraw.org/node/2544
To quote Alex,
unpack_thumb() provides largest available thumbnail/preview/whatever-it-called.
I too would like the ability to return the largest JPEG. I implemented this library and I'm converting .NEF and .CR2 in about 23-30 seconds. I would imagine directly accessing the JPEG would speed this up tremendously. We are converting thousands of files from various camera RAW formats to something a browser can display. Amazing library btw! So far it's the only one that could successfully convert a .RAF (Fuji compressed raw file) with no changes.
Added in #98.
Hi, I was wondering if there's a way to run the unpack_thumb() method to extract the embedded thumbnail in the source raw file?
dcraw allows us to do this and the libraw C++ wrapper API has support for this as well.
Source: https://www.libraw.org/docs/Samples-LibRaw.html and https://www.libraw.org/docs/Samples-LibRaw.html#code
Currently, the entire raw is converted into a png file which takes a lot of time (around 1 minute per raw image) so I'd prefer to extract the embedded jpeg from the raw (if it exists) and fall back to the regular conversion if it doesn't exist.
For your reference, this is how it's done using dcraw: