james-see / iptcinfo3

iptcinfo working for python 3 finally do pip3 install iptcinfo3
51 stars 31 forks source link

IPTC info not detected #5

Closed guy881 closed 4 years ago

guy881 commented 5 years ago

Hey, I have a problem with extracting IPTC data, for some pictures there is some strange data in dictionary. Have a look.

info = IPTCInfo(image_file)
data = info.getData()

image

I am sure there is some IPTC data there, as you can see here on online IPTC reader

image

Also there is a warning in the console: iptcinfo3: WARNING - File not a JPEG, trying blindScan

Here I'm attaching the problematic picture: karski_juliusz_002.jpg

guy881 commented 5 years ago

I guess this fork is no longer maintained, so in the end I used https://github.com/guinslym/pyexifinfo

james-see commented 5 years ago

It works fine. Here is the method to get it to work:

    info = iptcinfo3.IPTCInfo('yourimage.jpg'])
    for k, v in info._data.items():
        print(k, v)
james-see commented 5 years ago

@guy881 I will update the readme with examples for this if you think it would be helpful. Thanks for reporting this issue. I am closing since I confirmed this works fine to view the IPTCInfo dictionary from this image that you had a question about.

guy881 commented 5 years ago

I've checked it and it still doesn't return the attributes from screenshot above. Is there any reason to use private _data instead of getData?

image

Sure, examples would be definitely helpful.

james-see commented 5 years ago

I will run the exact file and paste in the results here so you can see exactly how to run it. The class and dict of lists structure is annoying in this implementation, part of the technical debt we inherited taking over this project and porting to python 3. It is definitely not obvious how to process and get useable data out. For my part I at least tidied up and merged and pushed the latest tagged release that you should download from pypi that fixed the broken .save() method.

james-see commented 5 years ago

Ok, @guy881, I added your example to my python-examples repo and here is the code for iptcinfo3-example.py:

"""
What: IPTCINFO3 example
Author: James Campbell
Date: 5 July 2019
"""
import iptcinfo3

try:
    info = iptcinfo3.IPTCInfo('assets/guy881.jpg', inp_charset="cp1250",
                              out_charset='cp1250', force=True)
    print('-------IPTC DATA FOUND-------')
    print(info.packedIIMData())
    for k, v in info._data.items():
        print(f"KEY: {k} VALUE: {str(v)}")
    # info['city'] = '#magistræde #🇩🇰'
    # info.save()
except Exception as e:
    if str(e) != "No IPTC data found.":
        raise
james-see commented 5 years ago

And here is the result, note that key 183 tells you the encoding is CP1250, so you can set it accordingly.

-------IPTC DATA FOUND-------
packedIIMData: illegal dataname '183' (183)
packedIIMData: illegal dataname '231' (231)
packedIIMData: illegal dataname '240' (240)
b'\x1c\x02\x00\x00\x02\x00\x04\x1c\x02\x19\x00\rHenryk Karski\x1c\x02\x19\x00\x07W\xb3ost\xf3w\x1c\x02\x19\x00\x0cMaria Gainza\x1c\x02\x19\x00\nbiblioteka\x1c\x02\x19\x00\x13okres mi\xeadzywojenny\x1c\x02\x19\x00\x0eJuliusz Karski\x1c\x02\x19\x00\x0eJuliusz Karski\x1c\x02\x19\x00\x0cmi\xeadzywojnie\x1c\x02\x19\x00\x08lata 30.\x1c\x02\x19\x00\x0bziemia\xf1stwo\x1c\x02\x19\x00\x07dziecko\x1c\x02\x19\x00\x08ch\xb3opiec\x1c\x02\x19\x00\x06dzieci\x1c\x02\x19\x00\x0fzdj\xeacie grupowe\x1c\x02\x19\x00\x07kobieta\x1c\x02A\x00\x14FotoWare FotoStation\x1c\x02P\x00\x0eJuliusz Karski\x1c\x02i\x00\x1eZapomniani \x9cwiadkowie XX wieku\x1c\x02n\x00\x06DSH/OK\x1c\x02s\x00\x1aArchiwum Historii M\xf3wionej\x1c\x02x\x00\xbcFrancuzka Maria Gainza z Juliuszem Karskim i Henrykiem Karskim (mniejszy, brat Juliusza Karskiego) na schodach do biblioteki pa\xb3acu we W\xb3ostowie, 1936-1937.\r\nFot. zbiory Juliusza Karskiego'
KEY: 20 VALUE: []
KEY: 25 VALUE: ['Henryk Karski', 'Włostów', 'Maria Gainza', 'biblioteka', 'okres międzywojenny', 'Juliusz Karski', 'Juliusz Karski', 'międzywojnie', 'lata 30.', 'ziemiaństwo', 'dziecko', 'chłopiec', 'dzieci', 'zdjęcie grupowe', 'kobieta']
KEY: 118 VALUE: []
KEY: 65 VALUE: FotoWare FotoStation
KEY: 80 VALUE: Juliusz Karski
KEY: 105 VALUE: Zapomniani świadkowie XX wieku
KEY: 110 VALUE: DSH/OK
KEY: 115 VALUE: Archiwum Historii Mówionej
KEY: 120 VALUE: Francuzka Maria Gainza z Juliuszem Karskim i Henrykiem Karskim (mniejszy, brat Juliusza Karskiego) na schodach do biblioteki pałacu we Włostowie, 1936-1937.
Fot. zbiory Juliusza Karskiego
KEY: 183 VALUE: CP_1250
KEY: 231 VALUE: E=TXT D=2014-08-21 T=09:48:00 U=Iwona%20MakowskaE=TXT D=2014-10-13 T=10:52:03 U=Iwona%20Makowska
KEY: 240 VALUE: 
james-see commented 5 years ago

Note @guy881 that using the right encoding even keeps all of the diacritics intact, etc. The _data object is not "private" as there is no such thing in Python, but it is convention to put an underscore in front of things that are supposed to be internal to the class or method itself. If you are writing an app to interact with it, you can definitely access these internal objects since that is how it is designed to work.

james-see commented 5 years ago

Also @guy881 for each key, you can do iptcinfo3.IPTCData.get(info._data, 25) to get back the value in case you need that method as well or only want specific keys for each photo that are standard IPTC data.

james-see commented 5 years ago

Also @guy881 one more thing, for each standard IPTC key, you can get the string value back:

>>> iptcinfo3.IPTCData._key_as_str(25) 
'keywords'
>>> 
james-see commented 4 years ago

Ok. It has been 9 days, as I stated above, you can access all the data required via the methods I describe along with examples. Closing. Thanks.

guy881 commented 4 years ago

Oh, thanks a lot for your extensive explanation @jamesacampbell ! Right, so the problem was the lack of inp_charset argument. Before I tried to decode the result of getData() later on but that didn't work.

I do not fully agree with you concerning interacting with internal things, I mean the interface. If I am using a library and I am not modifying it in any way, I would prefer to use attributes without the leading underscore, of course, if I am not doing something extremely extraordinary.

What would you say for wrapping _data to data attribute which would be something like this:

{iptcinfo3.IPTCData._key_as_str(k): v for k, v in info._data.items()}

?

james-see commented 4 years ago

@guy881 I don't think I fully understand you regarding implementation/suggestions. Would be happy to review a merge request if you branch and make some mods to the code. Also happy to chat further about this via signal or google hangout etc. if you have ideas.

regarding single leading underscores in vars:

_single_leading_underscore This convention is used for declaring private variables, functions, methods and classes in a module. Anything with this convention are ignored in from module import *. However, of course, Python does not supports truly private, so we can not force somethings private ones and also can call it directly from other modules. So sometimes we say it “weak internal use indicator”.

This is from: https://hackernoon.com/understanding-the-underscore-of-python-309d1a029edc

guy881 commented 4 years ago

Great, I will come back and speak to you about my suggestions around October, as I am quite busy nowadays. Thank you ;)