Open jgyprime opened 5 months ago
Hi @jgyprime
thank you for using HomeGallery and I am glad that you like it.
Further, thank you for reporting your issue with the date. You did a great job nailing the problem and provided a test picture. Awesome.
Yes. My assumption was: If there is a date provided by GPS, it should be quite accurate. However your picture has 1) no further GPS coordinates and 2) the date 1970:01:01 00:00:00Z
is the typical UNIX birth date.
Do you think it would be sufficient to allow the GPS date only if GPS coordinates are available? This would keep the basic assumption but will check it in detail...
@jgyprime Since you reporting that you like to use 500k images: Please be aware of #134 which discusses some limits of HomeGallery with larger image count for the database
@jgyprime Since you reporting that you like to use 500k images: Please be aware of #134 which discusses some limits of HomeGallery with larger image count for the database
After removing the gps date info (as I said above) the indexation has restarted. Right now, it is indexing, it managed to index approximately 45k pictures... I do not know how long it will take, but I will let it finish. I've already seen that discussion, if I reach any limitation, then I will try to figure out what limitation it has reached.
My NAS is a Terramster F4-421 Cpu: intel celeron j3455 Ram: 12 gb ddr3 (it came with 4 gb, I added another 8gb from an old laptop) I ditched the proprietary os and installed a debian + utilities I need. The main drive (os and utilities) is a 250 gb SSD. The "storage" drive for the photos is a 8 tb WD Red Pro HDD.
Do you think it would be sufficient to allow the GPS date only if GPS coordinates are available? This would keep the basic assumption but will check it in detail...
Sure. For me it is good enough. Right now I am using the version I compiled by myself from source wuth my change. For what I need, it is good enough.
I've already seen that discussion, if I reach any limitation, then I will try to figure out what limitation it has reached.
Alright. Please push me if you reach problems. It bugs me that there is a problem which should not be there in theory. Since I do not face the problem I need an external push and someone who really want to have it solved.
Thank you for the details of your system. It helps to know the target systems.
For me it is good enough. Right now I am using the version I compiled by myself from source wuth my change.
Awesome. Currently I am implementing a plugin system. When I stumble across this part I will ensure that the GPS date will only taken if there is also a GPS position.
In the meanwhile if you find a better strategy to identify the date, please let me know.
In the meantime, the indexation finished I observed only ~100k photos were indexed. When I searched for jpg files, I found ~400k photos There are other formats there (png, gif and other).
I have a few questions:
In the meantime, the indexation finished I observed only ~100k photos were indexed. When I searched for jpg files, I found ~400k photos There are other formats there (png, gif and other).
Do you have lots of binary duplicates? Do you have files which lead to the same SHA1 checksum?
* is there any limitation to file / folder naming?
No, there are no limits. Neither in file count nor in folder depth. All files should be considered.
Do you use any file filter which excludes some of the files?
* how is the software handling duplicate named files?
The file needs to be unique by OS filename for the file indexer and unique by SHA1 for the database. Same SHA1 is handled as duplicate and file data are merged.
There are corner cases with side cars of duplicate files, I can go in depth with that if requested.
But basically if you just copy a image/folder byte-by-byte from one place to another OS path these files are duplicates. Even if later if they are renamed since there file content is unchanged and contains the same data. This is a design decision with the goal to show only unique media by the assumption that most people have no clue how many duplicates they are storing and IMHO it does not give any value to show pictures twice.
To identify the files which are indexed you can dump information from index files *.idx
like
zcat Picutures.idx | jq .data[].filename | wc -l
This should print the count of your files which should be about 400k according to your provided information.
To identify the entries from the database you can run
zcat database.db | jq .data[].id | wc -l
To identify unique database entries you can run
zcat database.db | jq .data[].id | sort -u | wc -l
The later should than print about 100k according to your provided information.
Maybe it is worth reading the internals of the gallery to gain further insights and to clarify further questions.
Thank you for reporting your experience and questions.
is there any limitation to file / folder naming?
One more thing: HomeGallery imports the files in chunks to deal with internal limitations and to provide early feedback (show images in the browser). So the media import might also in a intermediate state and not all your files are imported yet?
This import process can be restarted and does not need to be run in one single run.
Hi @jgyprime
I like to inform you that the newest master contains stream based database creation which requires less memory. So your 400K should be now fine to be processed and updated.
@jgyprime Further, I am happy to announce the first experimental plugin feature in the current master! See docs.home-gallery.org/plugin for further details!
With plugins you can easily "fix" the geo data issue by your own database mapper.
Thank you for this great and amazing software. I've been using it with almost 4 TB of personal photos (only photos). I think that I have more than 500k photos there...
But I think I found something that can be improved.
After the initial indexation of photos finished (it took several days on my low powered Celeron NAS), I observed that a lot of my photos were added to the database incorrectly, with 1970 as year... And I started investigating the reason.
For example, in the photo I uploaded, the GPS data is set incorrectly in the photo exif:
When added to gallery, the date is set to 1970 (date is taken from GPS info in exif)... I do not know how that GPS date got there, but I can assume that the phone tried to get the GPS date and time, but because the GPS on the phone was disabled, it got back to a default value of something from 1970...
I also found the source of the problem in the source code here: https://github.com/xemle/home-gallery/blob/master/packages/database/src/media/date.js#L44
const dateKeys = ['GPSDateTime', 'SubSecDateTimeOriginal', 'DateTimeOriginal', 'CreateDate']
If I remove the 'GPSDateTime' item from line 44, then everything works correctly after rebuilding and re-indexing the database.What do you think? Is an improvement possible in this case? For example:
Unfortunately, my knowledge of the js language is very close to 0, so I would prefer for someone with enough knowledge to find a potential implementation here.
Thank you for reading my very long post. Thank you for creating such a nice software.