Open llitkr opened 7 months ago
This might be related to the fs encoding stuff which I had a lot of trouble with early on. I'm going to try to replicate this issue when I have time, and try to create a test case. See if I can find code to work around this.
Ok, I can replicate this issue... question, why are you using pathlib
and os.path
at the same time?!
after renaming a file to ㅍ휸ㅇ류.JPG
, I can replicate the crash... I would probably try asking on the exiftool.org forums to see if there's any specific encoding problems.
I tried both utf-8
and euc_kr
or cp949
and all didn't work
from exiftool import ExifToolHelper, exceptions
from pathlib import Path
import sys
import os, logging
logging.basicConfig(level=logging.DEBUG)
def getFilesFromDirectory(p): # directory 변수로부터 해당 경로 내에 있는 모든 파일 목록 가져와 반환하기
return [x for x in p.iterdir() if x.is_file() and x.name != Path(sys.argv[0]).name]
mainDirectory = "./"
files = getFilesFromDirectory(Path(mainDirectory)) # 테스트용 폴더 "./2023-02-05"에서 모든 파일 목록 가져와 files 변수에 저장하기
e = ExifToolHelper(encoding='euc_kr', logger=logging.getLogger(__name__)) # ExifToolHelper를 e로서 초기화하기
print(e.encoding)
for f in files: # files 배열의 개수만큼 아래 내용 반복하기(i=0부터 files의 개수만큼 변경)
m = e.get_metadata(f) # f의 EXIF 메타데이터 가져와 m에 저장하기**
It's a bug but it may not be in pyexiftool... I'm not sure which korean encoding should be used for that
Ok, I can replicate this issue... question, why are you using
pathlib
andos.path
at the same time?!after renaming a file to
ㅍ휸ㅇ류.JPG
, I can replicate the crash... I would probably try asking on the exiftool.org forums to see if there's any specific encoding problems.I tried both
utf-8
andeuc_kr
orcp949
and all didn't workfrom exiftool import ExifToolHelper, exceptions from pathlib import Path import sys import os, logging logging.basicConfig(level=logging.DEBUG) def getFilesFromDirectory(p): # directory 변수로부터 해당 경로 내에 있는 모든 파일 목록 가져와 반환하기 return [x for x in p.iterdir() if x.is_file() and x.name != Path(sys.argv[0]).name] mainDirectory = "./" files = getFilesFromDirectory(Path(mainDirectory)) # 테스트용 폴더 "./2023-02-05"에서 모든 파일 목록 가져와 files 변수에 저장하기 e = ExifToolHelper(encoding='euc_kr', logger=logging.getLogger(__name__)) # ExifToolHelper를 e로서 초기화하기 print(e.encoding) for f in files: # files 배열의 개수만큼 아래 내용 반복하기(i=0부터 files의 개수만큼 변경) m = e.get_metadata(f) # f의 EXIF 메타데이터 가져와 m에 저장하기**
It's a bug but it may not be in pyexiftool... I'm not sure which korean encoding should be used for that
Hello. Thank you for reply.
The reason why I used both of pathlib and os.path, actually I don't know.
Python is not my specialty, and I received help from ChatGPT while creating this side project, and he wrote this code when I asked to read the list of files in a folder.
If there is a better way, I would appreciate it if you could let me know.
Anyway, this bug is very strange. An error always occurs when the file name is "ㅍ휸ㅇ류.JPG", but when the file name is "ㅍ휸류ㅇ류.JPG" (just one letter added), the bug does not occur. A bug occurs when the file name is "ㅍ휸휸휸ㅇ류.JPG". There certainly seems to be a problem with the letter "휸". However, even if the letter "휸" is included in the file name, it does not seem to necessarily cause a problem. In many cases, problems do not occur depending on the combination with other letters.
This problem is really interesting. If I had been deeply involved in the development of this program, I would have reproduced the problem step by step through debugging and looked for the cause.
This is more of an encoding bug. I don't really know what codepage or encoding is being used.
Does it work on the command line? perhaps try exiftool's forums
I'm using your code really well, but I've noticed one odd thing. If the name of the file contains certain Korean characters, I get a decoding error and the metadata of the file becomes unreadable. If I rename the file to something else, I can read the metadata normally.
This code works well for the most part, but the bolded part, m = e.get_metadata(f), often results in an error. This happens when the filename contains certain Korean characters, which I've found to be the case for the filename "ㅍ휸ㅇ류.JPG".
The error content is as follows
'cp949' codec can't decode byte 0xb7 in position 37: illegal multibyte sequence
I looked up this error, and it says that I just need to specify UTF-8 as the encoding option when opening the file in Python. However, Python is opening the file fine, and it's doing a good job of displaying other properties of the file. Is there any option in EXIFTool to control the encoding related part?