Closed kaufManu closed 2 years ago
that's an interesting problem... I guess I never thought about that use case when I changed the ExifTool base class with the encoding
parameter. https://github.com/sylikc/pyexiftool/blob/master/exiftool/exiftool.py#L739
One of the biggest problems up till that point was having mismatched tag encodings... but binary data. Off the top of my head, you could use the -w
flag to write it to file. But if you need it piped, I'd have to look at the code to think about it. Would probably have to make a fundamental change to revert it back to working with bytes... or output bytes with a flag.
I guess I could end up writing a separate method that works only in bytes, like the v0.4.x way... but with the synchronization flags.
I made that change to go from bytes to string to fix this internationalization issue https://github.com/sylikc/pyexiftool/issues/29 ... would -w
work or did you want to pipe that to bytes?
Thanks for the quick reply. Dumping it to a file would be fine for me - if that also supports batched processing?
I tested it out, but I don't know exactly how to specify the command in the get_tags
function. On the commandline, this works:
exiftool.exe -TAG_NAME -b -w cmd.dat image.dng
which produces a file imagecmd.dat
with the expected content. I've tried:
et.get_tags([".\\image.dng"], [TAG_NAME], ['-b', '-w', 'py.dat'])
This raises an Exception
File "site-packages\exiftool\helper.py", line 347, in get_tags
ret = self.execute_json(*exec_params)
File "site-packages\exiftool\exiftool.py", line 1030, in execute_json
result = self.execute("-j", *params) # stdout
File "site-packages\exiftool\helper.py", line 119, in execute
raise ExifToolExecuteError(self._last_status, self._last_stdout, self._last_stderr, params)
exiftool.exceptions.ExifToolExecuteError: execute returned a non-zero exit status: 1
But it does create the file imagepy.dat
. But the content of that file is still not just the binary data, it's again the dictionary with the additional SourceFile
etc tags. I've also tried to specify the parameters as ['-b -w py.dat']
but that does not create the file in the first place.
Thanks for the quick reply. Dumping it to a file would be fine for me - if that also supports batched processing?
Yes, it has some special features for batch processing actually. Search PH's ExifTool Documentation for -textOut
and read the documentation on how exiftool uses the -w
flag.
I tested it out, but I don't know exactly how to specify the command in the
get_tags
function. On the commandline, this works:exiftool.exe -TAG_NAME -b -w cmd.dat image.dng
which produces a file
imagecmd.dat
with the expected content. I've tried:et.get_tags([".\\image.dng"], [TAG_NAME], ['-b', '-w', 'py.dat'])
Ok, so this was a robustness change in v0.5.x. It raises an error because ExifToolHelper.get_tags supposedly always returns JSON. So you can't really use get_tags
with -w
.
Although... I would have expected a different error thrown. See the specific "Note" box at ExifTool.execute_json -w behavior
As per that note, the proper way to use -w
is using the execute() method (can be used in ExifToolHelper). It's a little more manual, but it would be run just like it is on the command line
exiftool.exe -TAG_NAME -b -w cmd.dat image.dng
becomes
et.execute(*[f"-{TAG_NAME}", "-b", "-w", "cmd.dat", "image.dng"])
Awesome, the execute
method does exactly what I want. For reference and other readers: I'm processing multiple images simply via
et.execute(*[f"-{TAG_NAME}", "-b", "-w!", "cmd.dat", "image1.dng", "image2.dng"])
I've added the !
to -w
to override existing output files.
Thank you for your quick help and your work for providing this package and keeping it so well maintained!
You're welcomed!
I'll think about adding a method like execute_bytes
to do the piped output... it's certainly an interesting problem that I didn't consider when doing the encoding
string change to fix internationalization issues...
Yeah that would definitely be helpful to avoid having to go over the disk to get to the data.
Let me know if I should test anything in the future.
I'll think about it a bit more. I will have a chance to think about the design later next week...
Probably will have something for you to test with if I end up implementing it (leaning towards it)
I really didn't consider that binary use case before. Binary maker notes data always has looked like junk to me lol
So I was just doing some testing, and I find that I can in fact use get_tags
to get some binary tag... I just get some string that says 'MakerNotes:PreviewImage': 'base64: ......'
with ExifToolHelper() as et:
print(et.get_tags("image.jpg", "MakerNotes:PreviewImage", params="-b"))
Is that what you're getting? You'd then be able to decode that directly
Yes, this at least gives access directly to the value of that tag, but since it's a string I don't know how to interpret it. I think the problem is that the execute
function automatically decodes the bytes to a string (let's say with utf-8) instead of just returning the raw bytes. Simply encoding that string again to bytes does not work because the original bytes from the tag were not meant to be interpreted as utf-8 in the first place.
So, the string that gets returned is a base64 encoded string that comes from the JSON encoding spec.
I looked into the code... just adding a execute_bytes
method isn't enough to fix this... the fundamental changes to the code that was made with commit 137c0e2b957dc499b3df41d7eee1dc5355957978 to move away from bytes implementation to a string implementation on all the calls... actually makes it difficult to revert or support both at once.
The Popen()'s encoding parameter along with the encoding/decoding inside execute() ... I'm not sure how I would support bytes and string in the same class.
Ah, with investigation, I might be able to change this after all... the Popen encoding is actually not used for the I/O to the process...
The only communication with the process in text is the stdin write. The reads are raw, unbuffered reads. I might need to create a branch and test these changes before making them live.
Would you be able to share a file with me and possibly a code snippet so I can test this against some useful binary data?
I need to double check whether I can share an example image - I'll be back!
I need to double check whether I can share an example image - I'll be back!
Ok, well it's not necessary anymore. I wrote a test using a custom tag, and it looks like I can read/write binary tag without an issue. https://github.com/sylikc/pyexiftool/pull/48/commits/60d793f78277816fd73051e388dc9f456ca5ad45
I will do a bit more testing before merging... Need to write a few more tests, but this should address your issue.
If you get a chance @kaufManu check out the PR and test to see if it works for your use case. I've been a bit busy recently, but I'll merge this in after more rigorous testing.
@sylikc I finally got around to test the PR - apologies for my late reply! I've tested it like this
data = et.execute("-b", f"-{TAG_NAME}", example_dng, raw_bytes=True)
and it seems to work like a charm - thank you for the change, makes my life quite a bit easier :) ! Can this execute function also handle batched input (i.e. obtaining the same tag for multiple DNGs at the same time? Not a big deal if not, I'm just wondering.
Can this execute function also handle batched input (i.e. obtaining the same tag for multiple DNGs at the same time?
So, it appears you can do it on the command line... but it comes out a mess. (it's just concatenated together) I can try adding an ExifToolAlpha
function which may do it... I would hit up the tag once to get the amount of bytes, then parse it afterwards. Let me think about it.
@kaufManu I think based on the 4/11 comment, using something similar ...
import base64
from exiftool import ExifToolHelper
def base64_recurse(d):
for k, v in d.items():
if isinstance(v, dict):
base64_recurse(v)
elif isinstance(v, str) and v.startswith("base64:"):
d[k] = base64.b64decode(v[7:])
with ExifToolHelper(common_args=['-n', '-g']) as et:
t = et.get_tags("*.jpg", "ThumbnailImage", params="-b")
for x in t:
base64_recurse(x)
print(t)
might be a better way. The concatenated mess really would be hard to parse, especially if you try to figure out what tag came from what file... or integrating tags... I tried running exiftool -config files\my_makernotes.config -j -MyMakerNotes -ImageSize -b *.jpg
in some test case and it would just get really messy really fast... as it's not easy to tell which tag came from what file... and such.
note: I used the "-g" tag in common_args just to have nested list to show that the base64_recurse
works across nested dicts and stuff. It's optional. But I've verified this works to get the binary, though it's slightly more inefficient than using the raw binary execute() because exiftool has to encode into base64 and pass more data through the pipe... but for your use case where the data is small, it might be worthwhile to do it this way.
I see - thanks for the additional information! The batched version is not that important, so the current solution works just fine for me! Feel free to close this issue whenever you like.
Thanks again for your work!
Fixed with v0.5.4
I need to read a tag that stores 100 bytes of binary data with a custom format that I have to parse myself. Without the
-b
option I'm getting the string(Binary data 100 bytes, use -b option to extract)
. I then useThis returns a dictionary with the name of the tag and the value of that tag as a string. However, I'd like to get the raw binary data so that I can parse it according to some external specifications. Is that possible? I think that on this line the binary data would be available, but it is automatically decoded to a string again. But additionally this line does not "just" return the value of the tag I'm interested in, but more information like
SourceFile
or the name of the Tag again.Long story short: How can I get the raw binary data stored in a tag?
Edit I forgot to mention that if I do this on the command line
The file
data.dat
contains the binary data that I would expect.