trevorhobenshield / amazon_photos

Amazon Photos API
https://pypi.org/project/amazon_photos
MIT License
44 stars 6 forks source link

Missing MD5 attribute? #6

Closed RedSquirrel87 closed 6 months ago

RedSquirrel87 commented 6 months ago

I tried the latest version and with: ap.upload("/home/mini/test") it crashes with the following output:

2023-12-18 00:19:44.675 [WARNING] :: Database ap.parquet not found, initializing new database Getting media: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 103/103 [00:02<00:00, 39.86it/s] Traceback (most recent call last): File "/home/mini/amazon.py", line 15, in r = ap.upload("/home/mini/test") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mini/venv/lib/python3.11/site-packages/amazon_photos/api.py", line 371, in upload files = self.dedup_files(self.db, path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mini/venv/lib/python3.11/site-packages/amazon_photos/api.py", line 299, in dedup_files md5s = set(db.md5) ^^^^^^ File "/home/mini/venv/lib/python3.11/site-packages/pandas/core/generic.py", line 6202, in getattr return object.getattribute(self, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'DataFrame' object has no attribute 'md5'

Is there some issue in the deduplication function or am I missing anything?

trevorhobenshield commented 6 months ago

Nice catch, I've added another optional param if you need to pass in your own set of file hashes, if not it will fall back to using self.db.md5, and if that doesn't exist then it will try to upload everything and skip anything that returns a 409. The md5 col should always be present, unless you have no images (only videos, docs, etc.) then maybe those don't go through a dedup check.

ap = AmazonPhotos(
    cookies={
        ...
    },
    db_path='~/ap.parquet',
    dtype_backend='pyarrow',
    engine='pyarrow',
)

md5s = ...
ap.upload('foo/bar', md5s=md5s)