Open ajquick opened 1 year ago
Looking at the code. It is possible for this to happen in the while loop if for example, the md5_file function throws an error.
I believe the md5 hash from the previous file would then be used rather than exit with an error?
Should a new md5 be generated if one already exists in the database?
Ex:
$file_for_md5sum = $row['oldpath'];
$md5sum = md5_file($file_for_md5sum);
Replace with:
...
$query = '
SELECT
path AS oldpath,
date_available,
representative_ext,
id,
md5sum
FROM '.IMAGES_TABLE.'
WHERE path NOT LIKE \'./upload/%\'
;';
...
if(!empty($row['md5sum']){
$md5sum = $row['md5sum'];
}else{
$md5sum = md5_file($row['oldpath']);
}
if(!empty($md5sum) && !is_null($md5sum) && $md5sum !== false){
...
//continue
I implemented the above changes and the code executed on 50G of files nearly instantly. Previously it was doing the md5 hash for every single file so it would have taken a minute or two.
I don't have concrete proof of this, but I believe there may be a bug causing some images to be assigned to the wrong entry in the database. This appears to happen when one of the following things are true:
Before running Virtualize:
Photo 001 - File /galleries/album/photo-1.jpg Photo 002 - File /galleries/album/photo-2.jpg Photo 003 - File /galleries/album/photo-3.jpg Photo 004 - File /galleries/album/photo-4.jpg
After running Virtualize:
Photo 001 - File /upload/etc/photo-1.jpg Photo 002 - File /upload/etc/photo-1.jpg Photo 003 - File /upload/etc/photo-3.jpg Photo 004 - File /upload/etc/photo-3.jpg
Basically for whatever reason, some photos end up with another photo's file. I cannot confirm how this happened... but I don't think the images had their md5 hashes and I was doing it with a LOT of very large images. (It should also be noted that these images were not the same).
Ideally a process would be added to only do a few images at a time as per the suggestion in #4.