plegall / Piwigo-virtualize

5 stars 4 forks source link

Plugin potentially causing duplicates? #8

Open ajquick opened 1 year ago

ajquick commented 1 year ago

I don't have concrete proof of this, but I believe there may be a bug causing some images to be assigned to the wrong entry in the database. This appears to happen when one of the following things are true:

Before running Virtualize:

Photo 001 - File /galleries/album/photo-1.jpg Photo 002 - File /galleries/album/photo-2.jpg Photo 003 - File /galleries/album/photo-3.jpg Photo 004 - File /galleries/album/photo-4.jpg

After running Virtualize:

Photo 001 - File /upload/etc/photo-1.jpg Photo 002 - File /upload/etc/photo-1.jpg Photo 003 - File /upload/etc/photo-3.jpg Photo 004 - File /upload/etc/photo-3.jpg

Basically for whatever reason, some photos end up with another photo's file. I cannot confirm how this happened... but I don't think the images had their md5 hashes and I was doing it with a LOT of very large images. (It should also be noted that these images were not the same).

Ideally a process would be added to only do a few images at a time as per the suggestion in #4.

ajquick commented 1 year ago

Looking at the code. It is possible for this to happen in the while loop if for example, the md5_file function throws an error.

I believe the md5 hash from the previous file would then be used rather than exit with an error?

Should a new md5 be generated if one already exists in the database?

Ex:

$file_for_md5sum  = $row['oldpath'];
$md5sum = md5_file($file_for_md5sum);

Replace with:

...
  $query = '
SELECT
    path AS oldpath,
    date_available,
    representative_ext,
    id,
    md5sum
  FROM '.IMAGES_TABLE.'
  WHERE path NOT LIKE \'./upload/%\'
;';
...

if(!empty($row['md5sum']){
    $md5sum = $row['md5sum'];
}else{
    $md5sum = md5_file($row['oldpath']);
}

if(!empty($md5sum) && !is_null($md5sum) && $md5sum !== false){
  ...
 //continue
ajquick commented 1 year ago

I implemented the above changes and the code executed on 50G of files nearly instantly. Previously it was doing the md5 hash for every single file so it would have taken a minute or two.