Open ohade opened 10 months ago
HI @ohade sounds like a good feature request to add.
Hi, regarding Exif format: https://en.wikipedia.org/wiki/Exif regarding the data that can be extracted and how: take any picture taken on a mobile phone and run the following python code:
from PIL import Image, ExifTags
from PIL.ExifTags import TAGS, GPSTAGS
def get_exif_data(image_path):
img = Image.open(image_path)
image_exif = img.getexif()
for key, val in image_exif.items():
if key in ExifTags.TAGS:
print(f"ID: {key}, TAG: {ExifTags.TAGS[key]}, VAL: {val}")
def get_geotagging(exif):
if not exif:
raise ValueError("No EXIF metadata found")
geotagging = {}
for (idx, tag) in TAGS.items():
if tag == 'GPSInfo':
if idx not in exif:
raise ValueError("No EXIF geotagging found")
for (key, val) in GPSTAGS.items():
if key in exif[idx]:
geotagging[val] = exif[idx][key]
return geotagging
def get_location(image_path):
image = Image.open(image_path)
exif = image._getexif()
geotagging = get_geotagging(exif)
for key, val in geotagging.items():
print(key, val)
image_path = ...
get_exif_data(image_path)
get_location(image_path)
For example, here is the data I extracted from a picture I have on my android Samsung phone:
**get_exif_data(image_path)**
ID: 256, TAG: ImageWidth, VAL: 4000
ID: 257, TAG: ImageLength, VAL: 3000
ID: 34853, TAG: GPSInfo, VAL: 696
ID: 296, TAG: ResolutionUnit, VAL: 2
ID: 34665, TAG: ExifOffset, VAL: 238
ID: 271, TAG: Make, VAL: samsung
ID: 272, TAG: Model, VAL: SM-G998B
ID: 305, TAG: Software, VAL: G998BXXU5CVDD
ID: 274, TAG: Orientation, VAL: 6
ID: 306, TAG: DateTime, VAL: 2022:06:18 10:56:28
ID: 531, TAG: YCbCrPositioning, VAL: 1
ID: 282, TAG: XResolution, VAL: 72.0
ID: 283, TAG: YResolution, VAL: 72.0
**get_location(image_path)**
GPSLatitudeRef N
GPSLatitude (32.0, 5.0, 32.412119)
GPSLongitudeRef E
GPSLongitude (34.0, 49.0, 3.7128)
GPSAltitudeRef 0
GPSAltitude 64.0
So a few use cases:
Also, I use the geodata and datetime to convert it to UTC time, like so:
from timezonefinder import TimezoneFinder
import pendulum
def fix_timestamp_using_geoDataExif(latitude, longitude, timestamp):
if latitude == 0 and longitude == 0:
return timestamp
tf = TimezoneFinder()
time_zone_str = tf.timezone_at(lat=latitude, lng=longitude)
if not time_zone_str:
return timestamp
local_time = pendulum.from_timestamp(timestamp, tz=time_zone_str)
utc_time = local_time.in_timezone('UTC')
return int(utc_time.timestamp())
Feature Name
[Feature Request]: Enhance fastdup with EXIF Data Integration, Including Geodata and DateTimeOriginal
Feature Description
What does the feature do?
Integrates EXIF data (geodata and DateTimeOriginal) into fastdup, allowing for more nuanced sorting, filtering, and deduplication by recognizing the original images.
Why do you think it's important?
EXIF data provides essential context and can detect original images among duplicates, thereby preserving crucial metadata. It's vital for industries that require location and time-specific insights.
How will it benefit users?
Users will gain richer insights, more accurate deduplication, and the preservation of important metadata. This will increase dataset quality, streamline data operations, and potentially reduce costs.
Contact Information [Optional]
No response