Closed yggdrasil75 closed 5 months ago
I'm testing your changes, and it's taking about twice as long to load the images with multithreading compared to using the previous way, even for a large image set with around 40k images.
Do you know why there might be such a large discrepancy in performance between our systems? I'm testing on Linux and loading from an SSD.
a few things I could think of: I have a lot of non-images (filtered out extensions) in the folder, this may be causing slow downs for me if your folder is mostly just verified images.
thread count: if you have few threads, then it wouldnt be beneficial because the process of creating a new thread is slower than using just 1 existing thread. (or if you limit the threads somehow like setting affinity or something) I have an i5-13600kf
os differences: I am on mint 21 with a manual upgrade to kernel version 6.5 (came with 5.10. I probably should upgrade to 6.9)
hardware: I am using an intel p5500 which is a read focused u.2 drive hooked up to a highpoint ssd7120 raid card. the card is in hba mode (so the highpoint shouldnt matter for anything) the drive is formatted zfs, which means it may be that it was cached, but I loaded the current release 5 times in the last 24 hours before making the change, so if its caching, it should have been cached before (after checking again, it was still slower with the compiled version)
compilation issues: I am comparing the compiled version of the latest release with the uncompiled (well, pycache compiled) source code. python version/pip package version differences: I am using python 3.10.13 and the version of concurrent.futures that comes with it. maybe you have an older or newer version with a bug?
ram: I have 96gb of ddr5 5600 in 2 slots, dual channel configuration. it may affect speeds
data fragmentation: this shouldnt be a major issue on ssds, but it might have a minor effect.
final thing to consider: what is your before speed for the folder of 40k? is it already around 30 minutes and now around 1 hour, or was it around 2 minutes and now around 4? if its the latter, then it may be an issue on my end.
I have a lot of non-images (filtered out extensions) in the folder, this may be causing slow downs for me if your folder is mostly just verified images.
I only had images and text files in the folders I tested.
thread count: if you have few threads, then it wouldnt be beneficial because the process of creating a new thread is slower than using just 1 existing thread. (or if you limit the threads somehow like setting affinity or something) I have an i5-13600kf
I have a Ryzen 7 5800X with 8 cores, so it should be using 12 threads according to the Python docs.
os differences: I am on mint 21 with a manual upgrade to kernel version 6.5 (came with 5.10. I probably should upgrade to 6.9)
I'm using Ubuntu 22.04 with kernel version 5.15.
compilation issues: I am comparing the compiled version of the latest release with the uncompiled (well, pycache compiled) source code.
I used the source code version in all my tests.
python version/pip package version differences: I am using python 3.10.12 and the version of concurrent.futures that comes with it. maybe you have an older or newer version with a bug?
I'm using Python 3.11.9.
ram: I have 96gb of ddr5 5600 in 2 slots, dual channel configuration. it may affect speeds
I have 32 GB of DDR4 3600 RAM (dual channel).
final thing to consider: what is your before speed for the folder of 40k? is it already around 30 minutes and now around 1 hour, or was it around 2 minutes and now around 4? if its the latter, then it may be an issue on my end.
It was around 7 seconds and then 12 seconds. I only measured the time for the part of the code that was changed (the part that was originally a for loop). The total time was a few seconds longer than that.
The folder contains about 53 GB of images.
so, basically everything about our setup is different. I will test the parts I can test without hardware changes (cause money costs) and there may be ways to get speed benefits other than this.
let me check for a bit and get back to you.
folder size: 120gb (massive images I guess) from startup to folder loaded, not from empty folder to load folder I may have been exaggerating with the "1 hour" comment, I hadnt timed it before cause I started it and walked away. compiled removing all non-images: 9:45. compiled with logging to dev/null: 9:30 my version: 1:45 uncompiled 3.10: 11:45 uncompiled 3.11: 12:00 (weird. would expect faster than 3.10 - might be background processes)
maybe I should copy this over to a non-zfs location so that encryption, compression, and dedup wont have an effect. however I also lose the dual ssd read allowed by zfs and the caching ext4 speeds for my version: 3:15 compiled logging to dev/null: 9:00.
so, with these findings, I would guess the issues that are causing slowdowns are: generating the thumbnails for the list (probably could cache that) where is this done? it looks to be in data, but then I dont see what is actually calling data with decorationrole. reading to exif (its a library, so not much can be done except finding a new library if this is the slowdown) rotations may be common, so maybe caching the rotation data would help. (outside the scope, but I would love to just overwrite a metadata rotated image with the image, but rotated) perhaps creating a cache for an entire subfolder would help if the subfolders dont change (ie: leave a hidden taggui.cache in each folder to speed up loading)
perhaps a setting, or check on first time run the speed and core count of the cpu (I think this is to blame) and find a way to determine where its more beneficial to use multithreading (ie: over 16 threads, use multithreading) or single threading (ie: under 16 threads use single) but I dont actually know where those limits are.
most of my ideas beyond the initial pr are caching, so first time opening a folder is still slow, but later times are faster. I want to just put the cache that I wrote for another tool around every function 1 at a time and see when it causes issues, when it speeds things up, and when its just plain worse.
either way, something needs to be done about startup speeds on large datasets, so I would like to leave this open until I can figure it out.
generating the thumbnails for the list (probably could cache that) where is this done? it looks to be in data, but then I dont see what is actually calling data with decorationrole.
Each thumbnail is created when it is first needed to display the image list (this is where DecorationRole
is used), and then it is saved in memory. In an early version of TagGUI, the thumbnails were recreated each time, which caused slow scrolling when the images were large.
(outside the scope, but I would love to just overwrite a metadata rotated image with the image, but rotated)
I prefer not to modify the image files themselves.
either way, something needs to be done about startup speeds on large datasets, so I would like to leave this open until I can figure it out.
I don't know why it takes several minutes to load your folder, while it only takes a few seconds for me. Perhaps you could run a profiler to check which part of the code is taking up so much time?
profiler says its pathlibs eq function is 78%. however, that doesnt make sense as it shouldnt be this significant.
I converted the list of paths to a list of strings when checking if the text file path is in the list of text file paths. it dropped to 45 seconds could you check on your system if this provides benefits or detriments?
I'm away from my computer right now. I'll take a look in a day or two.
Maybe you have lots of text files. I think using a set instead of a list for the text file paths will help because sets have faster membership checking.
I changed to a set for the txt_strs list, and I also changed the image_paths to use str.endswith instead of path.suffix.lower() in image_suffixes. that one (image_paths one) may not do anything (I didnt see any time differences)
I went back to the initial folder (with all the subfolders and extra files) and it loads in just over 3 minutes. this is still not as fast as you report with your folder of 40k, however I would not expect it to be due to the excess files, different counts, and sizes. its much faster, and while not as fast as the initial pr changing to multithreaded, is still a worthwhile change. I am going to mark ready for review and hope that when you get back to check it works well enough to include.
finally got vsc back on my computer did all except the last change with nano. I dont enjoy that at all.
Changing the list to a set definitely helps. I measured a 2.5x speedup on a set of around 2000 images and their captions. I don't think converting the paths to strings makes any difference (Python probably handles it efficiently already), so I removed it. I also changed the list of image paths to a set, just to be consistent. I didn't notice any difference in speed.
Can you check how fast it is for you now?
current version on my drive: 1:30 (note: changed drives to use the ext4 one again, removed 5k images, and a few other things unrelated. hence why I giving a new time) version without changing to strings: 14:30 gonna test again without kobold running in the background for that second one: 13:30. I dont know what is wrong with my pathlib function on my system, but its destroying my performance. pathlib handles pathlib paths manually, os.path just handles them as strings (python handles them)
Can you check how fast this is for you?
that one (image_paths one) may not do anything (I didnt see any time differences)
I removed this change.
the difference is 5 seconds on the large folder, which with only 1 test could be background processes or many other things. it is probably fine with or without that change.
improve loading time on faster drives by threading file requests rather than sequentially loading.
on my system, I had a folder of 65k images which took an hour to load (sequential reads are a pain) but with this change, it took around 3 minutes because it wasnt waiting on receiving an image before requesting the next.
this will put more pressure on the drive, but its a massive difference in time for large lists