goose-ws / bash-scripts

A collection of bash scripts I've hacked together over time
MIT License
10 stars 3 forks source link

[Bug] TBA check update #10

Closed nicktags closed 3 months ago

nicktags commented 3 months ago

I think the file search should be: " TBA" instead of " TBA ". The files are normally named "something - TBA.mp4". I can adjust that in the script. My fear is if the script updates itself it will revert back to the old search.

rr0ss0rr commented 3 months ago

From what I remember when I had these files on my system, The following should work find "$library" -type f -regex '.*TB[AD].*'. I haven't seen any of the TBA titles in a while

rr0ss0rr commented 3 months ago

You can also move the grep logic into the find ".*TB[AD].*[.](asf|avi|mov|mp4|mpegts|ts|mkv|wmv)$". You can remove the 2nd .* if TBA is adjacent to the .extension ...TBA.mp4

goose-ws commented 3 months ago

The premise of the pattern is based off the TRaSH naming scheme, as described on lines 35-43 in the script. While I'm not sure how many more/less files could be matched by removing the trailing space, I'm hesitant to change the naming scheme. Honestly the best solution would be to interrogate Sonarr directly for any files with a flat title of "TB(A|D)" rather than searching the file system. But there's no API endpoint I can find that would easily search for files in such a way, short of iterating through every season of every series to list every episode and parse through them manually. But you've led me to think about the possibility of interrogating sqlite/postgres and searching that route. Let me think about it.

goose-ws commented 3 months ago

After consulting with a friend who has actual development experience and doesn't just write shitty bash scripts, here's what I'm thinking:

While interfacing with sqlite/postgres could make finding the files of interest more quick and easy, it feels particularly dirty, and I don't want to keep up with whatever database schemas the Sonarr developers decide to move to in the future. Even though it would just be reads, no writes, it would be safest to keep Sonarr as the communication device between the script and the database via the API.

The point of the regex is that I don't want to pick up files that aren't actually TBA/TBD files. But those letters don't really appear in (m)any English words, so it's probably safe to broaden the regex by removing the trailing space, and even the leading one. But in the interest of no false positives, I'm thinking I'll edit the logic to do this:

  1. Use a find command on the file system to search for files in the media directories that match the regex TB(A|D)
  2. Extract the series name, season number, and episode number from the file name. While the series name and season number will be available from directory names, this will largely rely on the episode number being extracted from the file name using a S[[:digit:]]+E[[:digit:]]+ type of regex, but that feels pretty reliable.
  3. Using the series name, season number, and episode number, use Sonarr's API to query for the episode title Sonarr has for that file. This will return a simple, clean, episode name only. e.g. "TBA" or "TBD".
  4. If the returned data is "TBA" or "TBD", execute the rest of the script logic.

As an aside, I can add an option for a variable that could be put in the .env file to allow people to define their own regex to match for files of interest. I would likely leave it undocumented as its necessity for use would probably be very few and far between. But that should help to clean up and edge cases that I cannot conceptualize right now.

You can also move the grep logic into the find ".*TB[AD].*[.](asf|avi|mov|mp4|mpegts|ts|mkv|wmv)$". You can remove the 2nd .* if TBA is adjacent to the .extension ...TBA.mp4

Thanks for this. I haven't taken the time to really figure out how the regex built into the find command works, as it doesn't appear to be exactly the same as grep -E, which I'm already pretty familiar and comfortable with. But in the interest of "doing things better", it would be good to reduce these two calls down to one. I just need to tinker with the carriage return that Mono tends to add to the end of lines output via docker exec calls (Hence the need for the piped in tr -d '\r').

nicktags commented 3 months ago

Appreciate your work on this. IDK why this isn't just built into Sonarr.

rr0ss0rr commented 3 months ago

Not sure how standard this recommendation is .. I'm on a Mac which comes with ancient utilities pre-installed (bash v3), so most have been upgraded to the latest gnu releases.. With gnu find, you can alter the type of regex find uses (which I believe defaults to "emacs").

find -regextype help
find: Unknown regular expression type ‘help’; valid types are ‘findutils-default’, ‘ed’, ‘emacs’, ‘gnu-awk’, ‘grep’, ‘posix-awk’, ‘awk’, ‘posix-basic’, ‘posix-egrep’, ‘egrep’, ‘posix-extended’, ‘posix-minimal-basic’, ‘sed’.

so if you add -regextype egrep it should work the way you're use to (if -regextype is indeed standard across platforms). Thanks again for your effort

goose-ws commented 3 months ago

Ok, by my testing, this feature is now complete and implemented. Let me know if it's not working for your use case.

nicktags commented 3 months ago

It's working great, thank you!

DeuX01 commented 3 months ago

Doesn’t work for me.

sonarr-update-tba.bash   ::   2024-05-30 21:20:49   ::   [info]  Located 5 files to process
sonarr-update-tba.bash   ::   2024-05-30 21:20:49   ::   [info]  Processing New Amsterdam (2018) - S05E01 - TBD [AMZN WEBDL-1080p][EAC3 5.1][HEVC]-NTb.mkv
sonarr-update-tba.bash   ::   2024-05-30 21:20:50   ::   [info]  Clean episode title [TBD] does not match TBA/TBD -- Skipping
sonarr-update-tba.bash   ::   2024-05-30 21:20:50   ::   [info]  Processing A Condition Called Love (2024) - S01E09 - 009 - TBA [HDTV-1080p][10bit][h265][AAC 2.0][JA]-SubsPlease.mkv
sonarr-update-tba.bash   ::   2024-05-30 21:20:51   ::   [info]  Clean episode title [His First Birthday] does not match TBA/TBD -- Skipping
sonarr-update-tba.bash   ::   2024-05-30 21:20:51   ::   [info]  Processing A Salad Bowl of Eccentrics (2024) - S01E09 - 009 - TBA [HDTV-1080p][10bit][h265][AAC 2.0][JA]-SubsPlease.mkv
sonarr-update-tba.bash   ::   2024-05-30 21:20:52   ::   [info]  Clean episode title [TBA] does not match TBA/TBD -- Skipping
sonarr-update-tba.bash   ::   2024-05-30 21:20:52   ::   [info]  Processing Rinkai! (2024) - S01E08 - 008 - TBA [HDTV-1080p][10bit][h265][AAC 2.0][JA]-Erai-raws.mkv
sonarr-update-tba.bash   ::   2024-05-30 21:20:53   ::   [info]  Clean episode title [TBA] does not match TBA/TBD -- Skipping
sonarr-update-tba.bash   ::   2024-05-30 21:20:53   ::   [info]  Processing Viral Hit (2024) - S01E08 - 008 - TBA [HDTV-1080p][10bit][h265][AAC 2.0][JA]-SubsPlease.mkv
sonarr-update-tba.bash   ::   2024-05-30 21:20:54   ::   [info]  Clean episode title [Real Battle] does not match TBA/TBD -- Skipping

Rolling back to previous version get my renames going again. Also episodes are not getting ignored, like the case of New Amsterdam, while in previous version it ignores it

goose-ws commented 3 months ago

You should be good now @DeuX01. For some reason, I converted the title to lowercase, and tbd does not in fact match TBD. Corrected that.

DeuX01 commented 3 months ago

Confirmed, working fine now. Thanks :)