fix timeouts for larger libraries

peter-mcconnell commented 1 year ago

What does this PR do?

tl;dr; moves while not end to the UI and performs sections of this loop asyncronously. You can test this change with docker pull pemcconnell/cleanarr

Updates get_dupe_content so that it accepts a page parameter instead of running inside a while loop - this moves the paging iteration outside of this function, allowing us to create external paging mechanisms. It then updates the get_dupes method so it takes an optional page argument, which gets passed to the get_dupe_content method - this moves the paging iteration to the uri level. Lastly it updates the loadDupeContent method in the frontend code so that it will essentially be responsible for the original while not end logic. This puts an HTTP request between each paging of the results. Shouldn't be too noticable perf wise for existing usecases, but keen for validation there. I'm expecting a speed up for all usecases due to the async batching this PR introduces but I don't have a way to easily test for this. Importantly this means that Cleanarr now works on larger datasets.

An optimisation added that executes dupes requests in batches of PAGE_SIZE asyncronously. This defaults to 10 but can be overwritten with a new env REACT_APP_PAGE_SIZE. Cost: some needless calls to the dupes api. Reward: total time to load decreased for users in the UI. Due to this asyncronous logic this PR makes larger libraries not only work, but do so much faster (for me I went from ~90 second load time to ~15 after the async patch). 15 seconds is still way to long and I suspect 504s are still technically possible, but this is a reasonable start.

What does this not do?

I made no attempt to optimise any of the existing logic or external calls, aside from the asyncronous batching of ?page=X to ?page=Y. I suspect many opportunities for perf improvements remain in breaking the dupes api call up further, caching and/or reducing the number of properties requested. I made no attempt at adding visual cues for the paging in the UI. I didn't write tests. Sorry. I have limited time and wanted to hack something together in 30 mins and am not familiar with typescript so this wasn't fluent for me. I haven't yet tested other parts of the software with my library - if I find other issues I'll try to patch and PR as I go

Caveats

I have never written typescript before. No clue if what I wrote is clean / idiosyncratic, but it seems to work.

How was it tested / how should a reviewer test it?

I ran this against my local instance which reliably 504s. It now works every time. You can see the paging in the console:
I'd appreciate some testing on scenarios that worked previously to ensure I'm not introducing noticable degradations.
General code review; esp. for the typescript

Scanning over the open issues I believe this will solve the following:

Closes #55 Closes #57 Closes #60 Closes #62 Closes #66 Closes #77 Closes #70

C2BB commented 1 year ago

Found your docker image and gave it a whirl, works like a charm, thank you so much! Been waiting for a solution to large libraries for some time :)

austempest commented 1 year ago

I didn't even think to see if peter-mcconnell had a branch. I pulled his image and can confirm it at least loads one one of my libraries. I'll try it on the rest later.

smason16 commented 1 year ago

Is this available for unraid, if so how?

Edit: I discovered how to to have community apps search for all docker images

peter-mcconnell commented 1 year ago

glad to hear it's working for you folks. enjoy the spare GBs

vanayr commented 1 year ago

Peter you are the man, working like a charm. Over 6k in my Movies library, so the timeouts were brutal.

se1exin commented 1 year ago

Wow this is amazing. Thanks for the awesome contribution @peter-mcconnell !

roggstudelpest commented 1 year ago

Not sure if this works on Synology Docker? Tried and cannot mount the /config folder. Are these changes in the latest Selexin release or is it possible to add them there?

*EDIT: I pulled your image from the Docker registry and that is the one that fails to load the /config mount path. Tried the same one used for Selexin's image and tried creating a new one. Both fail.

se1exin commented 1 year ago

@roggstudelpest this PR does not change to how the config folder works, and in my testing (albeit not on synology) the new image works fine with mounting the config folder.

Can you please raise a new issue and include the Synology errors?

snickers2k commented 1 year ago

strange that it now works for you guys... with latest release finally series are working, but 8k in movie library is not working...

bigsteve76 commented 1 year ago

This worked great for my TV Shows (230k ep.), but didn't work on Movies(45k). If I add my Movies library it gives the "HTTPConnectionPool(host='192.168.1.200', port=32400): Read timed out. (read timeout=30)" error every time after 30 seconds no matter what I have the PLEX_TIMEOUT set to. Has there been any solutions to this?

peter-mcconnell commented 1 year ago

Yeah there are undoubtedly more scenarios where timeouts are still likely. There are two high level routes to solving this problem in my head:

optimise the backend code further to get the calls down to some reasonable SLO for the average usecase. Ideally < 3s. The last time I looked at the code most of this time was due to fetching more attributes about the movies / shows that it finds, so this route may require a simplification of the UI also.
avoid the HTTP timeout issues entirely by building a little CLI that uses the same backend calls. This could be a "break glass" tool for people to run if their library is simply too heavy for the UI at present.

Option 1 of course is the neatest product solution as there is a single, reliable path for all users. However it is also the most involved from an engineering perspective and could bleed out to a more simplified UI view. Option 2 is the worst user-experience, but would be easy to build and likely be an "ok" option for people whilst waiting on Option 1.

I could try to knock up Option 2 tonight / tomorrow evening if I get some free time and will leave Option 1 to the maintainer(s)

bigsteve76 commented 1 year ago

Thank you for your response. That would be excellent. As I said in my previous comment, the TV works great and simply does exactly what I need it to do. I did two different Dockers so that I could use the TV version while trying to hash out the Movie versions issues.

peter-mcconnell commented 1 year ago

@bigsteve76 I've put something rough together for you to play with: https://github.com/se1exin/Cleanarr/pull/94

This is VERY simple and rough around the edges but should provide a starting block for a CLI based approach to this product

deepzone1 commented 1 year ago

Would you mind adding it to community applications ? I have no clue on how to add dockers from dockerhub in unraid :/ Have a library of 49 000 movies with around 2000 dupes. Would love to get rid of them :P

se1exin / Cleanarr

fix timeouts for larger libraries #84