terrelsa13 / MUMC

Multi-User Media Cleaner aka MUMC (pronounced Mew-Mick) will go through movies, tv episodes, audio tracks, and audiobooks in your Emby/Jellyfin libraries deleting media items you no longer want.
GNU General Public License v3.0
92 stars 6 forks source link

Increase processing speed by ignoring "PROCESSING WHITELISTED MOVIES" #121

Closed Floflobel closed 2 months ago

Floflobel commented 2 months ago

Hello,

I'm looking to optimize my speed as well as my RAM when I run MUMC on a large library with lots of users.

I've noticed that the script still performs the process for "Whitelisted" media when I'd like to ignore it completely.

https://github.com/terrelsa13/MUMC/wiki/mumc_configyaml-advanced_settings-behavioral_statements-media_type-conditional_behavior-action_control

0 - No action taken on True; No action taken on False (disabled)

But currently it's still processing my whitelisted libraries even though I've set the parameter to disabled.

Here's my configuration:


---
version: 5.8.18-beta
basic_settings:
  filter_statements:
    movie:
      played:
        condition_days: -1
        count_equality: '>='
        count: 1
      created:
        condition_days: 1095
        count_equality: ==
        count: 0
        behavioral_control: true
    episode:
      played:
        condition_days: -1
        count_equality: '>='
        count: 1
      created:
        condition_days: -1
        count_equality: '>='
        count: 1
        behavioral_control: true
    audio:
      played:
        condition_days: -1
        count_equality: '>='
        count: 1
      created:
        condition_days: -1
        count_equality: '>='
        count: 1
        behavioral_control: true
advanced_settings:
  filter_statements:
    movie:
      query_filter:
        favorited: true
        whitetagged: false
        blacktagged: true
        whitelisted: false
        blacklisted: true
    episode:
      query_filter:
        favorited: true
        whitetagged: false
        blacktagged: true
        whitelisted: false
        blacklisted: true
    audio:
      query_filter:
        favorited: true
        whitetagged: false
        blacktagged: true
        whitelisted: false
        blacklisted: true
  behavioral_statements:
    movie:
      favorited:
        action: keep
        user_conditional: any
        played_conditional: ignore
        action_control: 0
        dynamic_behavior: false
        extra:
          genre: 0
          library_genre: 0
      whitetagged:
        action: keep
        user_conditional: all
        played_conditional: ignore
        action_control: 0
        dynamic_behavior: false
        tags: []
      blacktagged:
        action: delete
        user_conditional: all
        played_conditional: any_played
        action_control: 0
        dynamic_behavior: false
        tags: []
      whitelisted:
        action: keep
        user_conditional: all
        played_conditional: any_any
        action_control: 0
        dynamic_behavior: false
      blacklisted:
        action: delete
        user_conditional: all
        played_conditional: all_created
        action_control: 5
        dynamic_behavior: false
    episode:
      favorited:
        action: keep
        user_conditional: any
        played_conditional: ignore
        action_control: 3
        dynamic_behavior: false
        extra:
          genre: 0
          season_genre: 0
          series_genre: 0
          library_genre: 0
          studio_network: 0
          studio_network_genre: 0
      whitetagged:
        action: keep
        user_conditional: all
        played_conditional: ignore
        action_control: 0
        dynamic_behavior: false
        tags: []
      blacktagged:
        action: delete
        user_conditional: all
        played_conditional: any_played
        action_control: 0
        dynamic_behavior: false
        tags: []
      whitelisted:
        action: keep
        user_conditional: any
        played_conditional: ignore
        action_control: 3
        dynamic_behavior: false
      blacklisted:
        action: delete
        user_conditional: any
        played_conditional: any_played
        action_control: 3
        dynamic_behavior: false
    audio:
      favorited:
        action: keep
        user_conditional: any
        played_conditional: ignore
        action_control: 3
        dynamic_behavior: false
        extra:
          genre: 0
          album_genre: 0
          library_genre: 0
          track_artist: 0
          album_artist: 0
      whitetagged:
        action: keep
        user_conditional: all
        played_conditional: ignore
        action_control: 0
        dynamic_behavior: false
        tags: []
      blacktagged:
        action: delete
        user_conditional: all
        played_conditional: any_played
        action_control: 0
        dynamic_behavior: false
        tags: []
      whitelisted:
        action: keep
        user_conditional: any
        played_conditional: ignore
        action_control: 3
        dynamic_behavior: false
      blacklisted:
        action: delete
        user_conditional: any
        played_conditional: any_played
        action_control: 3
        dynamic_behavior: false
  whitetags: []
  blacktags: []
  delete_empty_folders:
    episode:
      season: false
      series: false
  episode_control:
    minimum_episodes: 0
    minimum_played_episodes: 0
    minimum_episodes_behavior: Max Played Min Unplayed
  trakt_fix:
    set_missing_last_played_date:
      movie: false
      episode: false
      audio: false
  console_controls:
    headers:
      script:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      user:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      summary:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
    footers:
      script:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
    warnings:
      script:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
    movie:
      delete:
        show: false
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      keep:
        show: false
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      post_processing:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      summary:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
    episode:
      delete:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      keep:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      post_processing:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      summary:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
    audio:
      delete:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      keep:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      post_processing:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
      summary:
        show: true
        formatting:
          font:
            color: ''
            style: ''
          background:
            color: ''
  UPDATE_CONFIG: false
  REMOVE_FILES: false
admin_settings:
  behavior:
    list: blacklist
    matching: byPath
    users:
      monitor_disabled: false
  server:
    brand: emby
    url: 
    auth_key: 
    admin_id: 
  users:
  - user_id: xxxx
    user_name: xxxx
    whitelist:
    - lib_id: xxx
      collection_type: tvshows
      path: /media/storage3/tvshows
      network_path: null
      subfolder_id: '47033'
      lib_enabled: true
    blacklist:
    - lib_id: xxx
      collection_type: movies
      path: /media/storage6/movies
      network_path: null
      subfolder_id: '291714'
      lib_enabled: true
  api_controls:
    attempts: 10
    item_limit: 200
  cache:
    size: 256
    fallback_behavior: LRU
    minimum_age: 200
DEBUG: 1
...
terrelsa13 commented 2 months ago

This has been fixed.

Please update the advanced_settings > filter_statements section of the config to the latest layout.

Full Configuration Example

Floflobel commented 2 months ago

Beautiful as usual. Thank you for the responsiveness.

terrelsa13 commented 2 months ago

Np! Also to help with any confusion between filter_statements and behavioral_statements.

terrelsa13 commented 2 months ago

Hey @Floflobel I have made an attempt at shrinking the runtime (RAM) footprint with v5.8.22-beta. If you get a chance try it out and let me know if it helps condense your setup back to a single script.

It could just be you have more media_items and/or users than I can fathom; hence why a single script exhausts your RAM. Hopefully the removal of redundant and extra data in the v5.8.22-beta script was carrying helps.

Floflobel commented 2 months ago

Wow, that's a really nice notification I'm getting here. As soon as I have some time (in between now and Monday) I'll give it a try and get back to you. This is indeed an important point, even though I've separated my entire launch into several files. Thanks for your time, I'll get back to you as soon as possible.

Floflobel commented 2 months ago

I seem to be getting to an OOM very quickly even though I've only done ~20 users. What can I provide to help you?

terrelsa13 commented 2 months ago

When does the OOM happen?

  1. Before fetching of media_items?
  2. During fetching of media_items?
    1. Approximately how many users successfully fetch their media_items?
  3. During post processing of...
    1. Movies?
    2. Episodes?
    3. Audio?
    4. Audiobooks?
  4. During deletion of media_items?

Approximately how many media_items does your server have?

How much RAM is MUMC consuming before running out of memory?

Floflobel commented 2 months ago

The problem occurs during fetching of media_items, I managed to make 20 users then I have an OOM.

I am currently testing on 2356 media.

Concerning RAM: It's quite complicated to see, but I'll try to get this information. All I know is that I have 15GB of RAM available.

EDIT: I reach more than "13141.9MB" in less than 3 minutes.

terrelsa13 commented 2 months ago

EDIT: I reach more than "13141.9MB" in less than 3 minutes.

Whoa! That's a lot of RAM consumed in a short time.

The problem occurs during fetching of media_items, I managed to make 20 users then I have an OOM.

This makes sense. During the fetching of media_items MUMC has to store the information it needs for post processing later.

I am currently testing on 2356 media.

Concerning RAM: It's quite complicated to see, but I'll try to get this information. All I know is that I have 15GB of RAM available.

Thank you for the info!

I will create >=20 users and figure out a way to get up to ~2400 media_items. Then see if I get the same behavior.

Side Note: If I see the same behavior on my test server the only way I can think to further decrease the amount of data needing to be stored would be to fundamentally change how MUMC works.

Currently MUMC Works Like This:

  1. Fetch all media_items for the first user, using the filter_statements
  2. Fetch all media_items for the second user, using the filter_statements
    1. Repeat fetching media_items for the remaining users
    2. This is where lots of data gets stored for use in the next step
  3. Post process all media_items (aka use the behavioral_statements)
  4. Delete media_items that make it thru post processing

Fundamentally Changing How MUMC Works:

  1. Fetch a single media_item for all users, using the filter_statements
    1. Only a single media_item worth of data would ever be stored at any given time
  2. Post process this single media_item (aka use the behavioral_statements)
  3. Delete media_item if it makes it thru post processing
  4. Repeat for the next media_item

I do not want to get your hopes up.

Floflobel commented 2 months ago

Thank you for all this comprehensive information.

The question I have is what have you changed recently? I didn't have such a big problem until I updated in the last commit. I was managing with ~15GB of RAM to get through all my users and media (I'll try to have a look in the code).

For the record, with the old version I'd managed to reach my "almost" ideal setup. Everything worked and I could switch to all my libraries with separate configuration files.

Regarding the big changes you mentioned at the end, this is indeed what could solve all the problems. I thought of it when I saw how MUMC worked. But your arguments are completely valid, it's a very big change and a lot of time invested.

terrelsa13 commented 2 months ago

Thank you for all this comprehensive information.

Np!

The question I have is what have you changed recently? I didn't have such a big problem until I updated in the last commit. I was managing with ~15GB of RAM to get through all my users and media (I'll try to have a look in the code).

I am guessing my attempt to simplify the data stored during fetching actually made it worse. Previously the complete data for each media_item was stored for all behaviors (e.g. favorite, whitetag, blacktag, etc...) and for each user. Along with extra data to make post processing go a little faster.

My goal was to only save the minimum amount needed. Sounds like I made things worse and now additional memory locations for data are being created and exhausting your RAM sooner.

Regarding the big changes you mentioned at the end, this is indeed what could solve all the problems. I thought of it when I saw how MUMC worked. But your arguments are completely valid, it's a very big change and a lot of time invested.

I tried a sloppy-copy-paste for this to see if it was doable. It appears it is. I need to work thru how I want to handle the multiprocessing to parallel some/all users evaluations of the single media_item. It's also going to completely change how console data is shown when running.

It's still a big change. But not as big as I initially thought. But again, don't get your hopes up too high yet. I don't what I don't know.

terrelsa13 commented 1 month ago

@Floflobel No rush. Take a look here whenever you get a chance.

terrelsa13 commented 1 month ago

@Floflobel I was able to find the issue causing the memory to be exhausted. My own fault, was saving a list to itself. Which caused it to double in size each time it was saved.

The latest release (v5.8.29-beta) on the beta branch has the fix.

I tested with 24 users and >8000 media items. Memory usage peaked at ~200MB of RAM. YMMV.

Apologies for the slow fix on this one.

Floflobel commented 1 month ago

I can confirm that I'm also running at around 200mb/300mb with many more users.

I can also confirm that there is no longer any point in carrying out the work you've written about (#126).

I'm going to try with a lot more data as I've currently split everything into several configuration files.

Don't worry, it's already fantastic.

Thank again for you work!

terrelsa13 commented 1 month ago

Great! Let me know how things turn out.