makingglitches / GooglePhotoDownload

Connects to Google Photos and downloads all content, keeping track of original data on disk and moving files only on the computer and already on server to respective directories, and tries to download the entire collection and store size info for quicker startup as well as original file sizes of those on disk, downloading the files still on disk first so they can be freed up. Supports multiple user accounts. Its just a better mousetrap. Google Takeout prepares whole archives of photos, this allows you to download them separately and keep track of some statistics as well on space usage.
6 stars 4 forks source link

Google stores ghost rows in mediaitems #48

Closed makingglitches closed 2 years ago

makingglitches commented 2 years ago

There are several duplicates in one of the photo stores:

same id, same filename, different media attributes amounting to only a difference in creation date.

this ugly query eliminates them

select distinct id as Id, (select FileName from Queuestore q2 where q2.id = q.id limit 1), (select Mediadata from queuestore q3 where q3.id = q.id limit 1) from queuestore q left join storeitem s on s.id = q.id where s.id is null

literally there is no other way. the media dates could possibly be compared to get the lesser one if sqlite allows json parsing which it likely does not.

makingglitches commented 2 years ago

select , json_extract(q2.mediadata,'$.creationTime') as date1 from queuestore q2 inner join (select id from queuestore group by id having count() > 1 order by filename) as t on t.id = q2.id order by filename, date1,id

allows the extraction of date which is in european yyyy-mm-dd format, so can be sorted low to high

makingglitches commented 2 years ago

In the sample data provided there is one or two items that have differing filenames instead.

makingglitches commented 2 years ago

and now for a really ugly query to get the combination with the lowest creation date and same id.

SELECT distinct q3., t.date1 FROM QUEUESTORE Q3 INNER JOIN (select distinct q.id, ( select json_extract(q2.mediadata,'$.creationTime') AS DATE1 FROM QUEUESTORE Q2 WHERE Q2.ID = Q.ID ORDER BY DATE1 ASC LIMIT 1) AS DATE1 FROM QUEUESTORE Q GROUP BY Q.ID HAVING COUNT() > 1) AS T ON T.ID = Q3.ID AND JSON_EXTRACT(Q3.MEDIADATA,'$.creationTime') = T.date1 order by q3.id

makingglitches commented 2 years ago

Final query to get just the rows needed with no duplicates as follows:

select q3. from queuestore q3 group by q3.id having count() = 1

union

SELECT distinct q3. FROM QUEUESTORE Q3 inner JOIN (select distinct q.id, ( select json_extract(q2.mediadata,'$.creationTime') AS DATE1 FROM QUEUESTORE Q2 WHERE Q2.ID = Q.ID ORDER BY DATE1 ASC LIMIT 1 ) AS DATE1 FROM QUEUESTORE Q GROUP BY Q.ID HAVING COUNT() > 1) AS T

ON T.ID = Q3.ID AND JSON_EXTRACT(Q3.MEDIADATA,'$.creationTime') = T.date1 order by q3.filename

makingglitches commented 2 years ago

God I hate the lack of stored procedures in sqlite!

makingglitches commented 2 years ago

and finally...

insert into StoreItem(Id,FileNameOnServer,MediaData,UserId,WaitTillNext) select q3.Id, q3.FileName, q3.MediaData, ? ,1 from queuestore q3 where not exists (select null from storeitem s where s.id = q3.id) group by q3.id having count(*) = 1

union

SELECT distinct q3.Id, q3.FileName, q3.MediaData, ?,1 FROM QUEUESTORE Q3 inner JOIN (select distinct q.id, ( select json_extract(q2.mediadata,'$.creationTime') AS DATE1 FROM QUEUESTORE Q2 WHERE Q2.ID = Q.ID ORDER BY DATE1 ASC LIMIT 1 ) AS DATE1 FROM QUEUESTORE Q GROUP BY Q.ID HAVING COUNT(*) > 1) AS T

ON T.ID = Q3.ID AND JSON_EXTRACT(Q3.MEDIADATA,'$.creationTime') = T.date1

where not exists (select null from storeitem s where s.id = q3.id) order by q3.filename

makingglitches commented 2 years ago

Implemented and tested.