Closed dezza closed 4 months ago
The way it's working right now is actually intended. So parsing "The.Office.US..." to "The Office US" is correct
But that's not your expectation correct? How would you expect to access this specific data?
The way it's working right now is actually intended.
So parsing "The.Office.US..." to "The Office US" is correct
But that's not your expectation correct?
How would you expect to access this specific data?
Ah ok I see what you mean.
Well I expect "title" to be searchable by imdb/themoviedb thats simply why..
I guess release country could be a field
Yeah, I get your expectation, that's the reason why I initially made the release parser ;D
I'm still thinking about this
I wrote some logic for this that I think makes sense. I think you will be able to tell from it how I think the most reasonable way to handle it would be.
If next last word is not the
its definetily not "referring to an actual country"
/**
* @param {SceneTags} scenetags
*/
function stripTVShowCountry(scenetags) {
const lastElement = -1
const words = scenetags.title.split(' ')
if (scenetags.type === 'tvshow' &&
words.at(lastElement)?.match(/(?<country>US|UK|NZ|AU|CA)/u) &&
words.at(lastElement-1) !== 'the'
) {
scenetags.title = words.slice(0, lastElement).join(' ')
}
return scenetags
}
// Ends with country
console.log("Ends with country")
console.log(stripTVShowCountry(null, {title: 'Wilfred US', type: 'tvshow'}))
console.log(stripTVShowCountry(null, {title: 'Oy mate Crocodile Hunter AU', type: 'tvshow'}))
console.log()
// Ends with actual country, next last is "the". Concludes its a real title
console.log("Ends with country, next last is 'the'. Concludes its a real title")
console.log(stripTVShowCountry(null, {title: 'Soldiers in the US', type: 'movie'}))
console.log(stripTVShowCountry(null, {title: 'Food in the US', type: 'tvshow'}))
console.log(stripTVShowCountry(null, {title: 'Queen of the UK', type: 'tvshow'}))
Example:
Output
Ends with country
{ title: 'Wilfred', type: 'tvshow' }
{ title: 'Oy mate Crocodile Hunter', type: 'tvshow' }
Ends with country, next last is 'the'. Concludes its a real title
{ title: 'Soldiers in the US', type: 'movie' }
{ title: 'Food in the US', type: 'tvshow' }
{ title: 'Queen of the UK', type: 'tvshow' }
Thanks for the code and yes, it's a good point to catch some words (like "the") before the country code. I'll dive a little bit into it to check for other possible special words.
So, I'm finally implementing this. Gonna add "country" as a new field. Your code really helped, adapted it to JS and PHP, tests are looking good.
Done with latest release
https://github.com/pr0pz/scene-release-parser/releases/tag/v1.5.0
Hello.
Nice lib, but there is one issue I found that I think needs to be fixed, I'll gladly help as long as we can agree on the issue.
For example Wilfred exists as both an AU and US show.
AU (first released, 2007)
https://www.themoviedb.org/tv/3297
US (2011)
https://www.themoviedb.org/tv/39525-wilfred
This means that now the title is parsed as
Wilfred US
.It would be a safe assumption to think that any tag in capitalized country-code
US|UK|AU|NZ|CA
would mean ambigous titles and narrowing down to the specific show in respective country.Of course the rare occassion could happen that some title would be..
Toys.R.Us
, but unlikely that it would be capitalized.. If so thats a real corner-case not worth optimizing for!https://scenerules.org/html/2020_WDX_unformatted.html