Open thezoggy opened 11 years ago
Yes, it's because I went on binsearch/nzbindex and looked up each one individually, looking for case sensitivity.
If you do find any that you would like added, please let me know.
Are there really any flemish releases? ;-)
100's of pages according to binsearch/nzbindex, as surprised as you haha!
lol i did not see that coming. I am surprised indeed :-)
TURKSIH
looks to be misspelled.. unable to find any releases when searching for it on a rawsearch site.
here is my updated non-english entries.. left 'DE' out of the abbrv list just to protect against false positives
('alt.binaries.*', '[-.](danish|deutsch|dutch|dksubs|flemish|french|hebrew|japenese|japanese|german|ita-eng|korsub|norwegian|serbian|spanish|spanisch|swedish|swesub|turkish|DKsubs|nl\\.?sub)[-.]', 1, 1, 0, 'Blocks non-english language releases'),
('alt.binaries.*', '[-.](BL|CZ|ES|FR|GER|ITA|KOR|NL|PL|SE)[-.]', 1, 1, 0, 'Block non-english abbreviations'),
someone on irc asked me about all your 'various' blocking.. looking specifically:
(100010, 'alt.binaries.*', 'defa|knochen|giro|irls\\\\hybris|snoballkrigen!atkgalleria|realco|mp4sux|cytsunee|nzbroyalty', 1, 1, 0, 'Blocking various.'),
Added all but knoc[- ]?one , not enough time to test right now.
the recent changes you made still needs work. 'defa' is one i would have removed.. the realco, mp4sux was good to keep.
thought about the blacklist stuff tonight..
non-english content (alt.binaries.*)
NovaRip = looks to only do ITA releases.
I see them as 'NovaRip' and 'Nov aRip' and 'Nova Rip'.
Their releases sometimes are tagged with ITA
but sometimes as ITA-ENG
. Also, they usually have BDMux
or DLMux
as well.
Misfits.4x01.Ossessione.ITA.720p.BDMux.x264-NovaRip [1/1] - "Misfits.4x01.Ossessione.ITA.720p.BDMux.x264-NovaRip.nzb" yEnc (1/1)
Criminal.Minds.8x02.L.Accordo.ITA-ENG.1080p.DLMux.DD5.1.h264-NovaRip [1/1] - "Criminal.Minds.8x02.L.Accordo.ITA-ENG.1080p.DLMux.DD5.1.h264-NovaRip. nzb" yEnc (1/1)
Last.Resort.1x13.Bersaglio.Colorado.ITA-ENG.720p.DLMux.DD5.1.h264-Nova Rip [1/1] - "Last.Resort.1x13.Bersaglio.Colorado.ITA-ENG.720p.DLMux.DD5.1.h264-Nov aRip.nzb" yEnc (1/1)
Once.Upon.A.Time.2x04.Il.Coccodrillo.ITA-ENG.720p.DLMux.DD5.1.h264-Nov aRip [1/1] - "Once.Upon.A.Time.2x04.Il.Coccodrillo.ITA-ENG.720p.DLMux.DD5.1.h264-No vaRip.nzb" yEnc (1/1)
Bones.7x10.Una.Vita.Di.Umiliazioni.ITA.BDMux.x264-NovaRip [1/1] - "Bones.7x10.Una.Vita.Di.Umiliazioni.ITA.BDMux.x264-NovaRip.nzb" yEnc (1/1)
Now looking into ITA
.. doing regex [-.]ITA[-.]
looks to catch all the scenarios (which is already handled under a different blacklist):
john.alvarado - [1/7] - "La.Rivoluzione.di.Utena.dvd10.DVDRip.DivX.ITA-F2L.rar" yEnc (1/45)
Last.Resort.1x13.Bersaglio.Colorado.ITA-ENG.720p.DLMux.DD5.1.h264-Nova Rip [1/1] - "Last.Resort.1x13.Bersaglio.Colorado.ITA-ENG.720p.DLMux.DD5.1.h264-Nov aRip.nzb" yEnc (1/1)
some sample data,..
Seinpost Den Haag S01E06 NLSUBBED DUTCH - RealCo [00/34] - "Seinpost Den Haag S01E06 NLSUBBED DUTCH - RealCo.nzb" yEnc (1/1)
Tournee Generale S03E01 FLEMISH 720p HDTV - RealCo [00/43] - "Tournee Generale S03E01 FLEMISH 720p HDTV - RealCo.nzb" yEnc (1/1)
Community.S02E20.Fuereinander.geschaffen.GERMAN.DUBBED.DL.1080p.WebHD.x264-TVP [06/26] - "tvp-community-s02e20-1080p.nfo" yEnc (1/1)
Goede Tijden Slechte Tijden - S23E115 (08-02-2013) - RealCo [00/26] - "Goede Tijden Slechte Tijden - S23E115 (08-02-2013) - RealCo.nzb" yEnc (1/1)
[foreign]-[ Planet.E.Wintertraum.aus.Schneekanonen.German.DOKU.WS.HDTVRiP.XviD-UTOPiA ] [01/24] - "Planet.E.Wintertraum.aus.Schneekanonen.German.DOKU.WS.HDTVRiP.XviD-UTOPiA.par2" yEnc (1/1)
Israeli.Movie.Sof.Ha.Olam.Smola.2004.DVDRip-IL.XviD-DownRev [01/68] - "Sof.Ha.Olam.Smola.2004.DVDRip-IL.XviD-DownRev.par2" yEnc (1/1)
Little.Mrs.Pepperpot.Complete.PDTV.HebDub.XviD-Sweet-Star [47/47] - "Little.Mrs.Pepperpot.E50.PDTV.HebDub.XviD-Sweet-Star.avi" yEnc (1/203)
HebDub
? Is -IL.
a language tag? So this brings us to blacklist overlap... is it more efficient to to have two restrictive blacklists to catch all the possible variants.. or rely on one over zealous regex to try and catch more things but then have a whitelist? to counter it..
Can't quote, but @ thought about the blacklist stuff tonight..
I think that's a good idea to separate everything instead of everything being generic like it is currently.
so to catch all the novarip variants we can do: Nov[ a]+Rip
from nn trunk, tv non-english:
(seizoen|staffel|danish|flemish|(\.| |\b|\-)(HU|NZ)|dutch|Deutsch|nl\.?subbed|nl\.?sub|\.NL|\.ITA|norwegian|swedish|swesub|french|german|spanish)[\.\- \b]
\.des\.(?!moines)|Chinese\.Subbed|vostfr|Hebrew\.Dubbed|\.HEB\.|Nordic|Hebdub|NLSubs|NL\-Subs|NLSub|Deutsch| der |German | NL |staffel|videomann
(danish|flemish|nlvlaams|dutch|nl\.?sub|swedish|swesub|icelandic|finnish|french|truefrench[\.\- ](?:.dtv|dvd|br|bluray|720p|1080p|LD|dvdrip|internal|r5|bdrip|sub|cd\d|dts|dvdr)|german|nl\.?subbed|deutsch|espanol|SLOSiNH|VOSTFR|norwegian|[\.\- ]pl|pldub|norsub|[\.\- ]ITA)[\.\- ]
(french|german)$
from nn trunk, movie non-english:
(\.des\.|danish|flemish|dutch|(\.| |\b|\-)(HU|FINA)|Deutsch|nl\.?subbed|nl\.?sub|\.NL|\.ITA|norwegian|swedish|swesub|french|german|spanish)[\.\- |\b]
Chinese\.Subbed|vostfr|Hebrew\.Dubbed|\.Heb\.|Hebdub|NLSubs|NL\-Subs|NLSub|Deutsch| der |German| NL |turkish
(danish|flemish|nlvlaams|dutch|nl\.?sub|swedish|swesub|icelandic|finnish|french|truefrench[\.\- ](?:dvd|br|bluray|720p|1080p|LD|dvdrip|internal|r5|bdrip|sub|cd\d|dts|dvdr)|german|nl\.?subbed|deutsch|espanol|SLOSiNH|VOSTFR|norwegian|[\.\- ]pl|pldub|norsub|[\.\- ]ITA)[\.\- ]
so yeah i think first step is to mimic the foreign detection that nn knows.. then improve upon that
the blacklist regex are case insensitive.. as the nn code already does /i.
fyi a much nicer version of that blacklist test is actually part of nn+ in the misc/testing.. test_blacklist.php
trying out on that predb dump with:
[ -.](de|es|fr|ger|ita|ko|kor|nl|pl|se)[ -.]((19|20)\d\d|(480|720|1080)(i|p)|(bd|dvd.?|sat|vhs)?rip?|(bd|dl)mux|( -.)?(dub|sub)(ed|bed)?|complete|convert|(d|h|p|s)d?tv|dirfix|docu|dual|dvbs|dvdscr|eng|(h|x).?2?64|int(ernal)?|pal|proper|repack)
false position (de.dub) but since this gets blacklist via the actual lang one (french.hdtv) i guess its safe to ignore,
misses,
overall that thing is doing pretty damn well. so just need to add xbox360
and then the music stuff?
then we could just nuke PL-PROPHET|PL.HappyNY|PL-PPTCLASSiCS
to catch pretty much everything else we missed (in another regex with all the foreign specific groups).
things the actual lang (first) regex missed (fixed by pull request):
need to be handled on their own..
ok submitted pull update with some of my changes
so looks like we need to also catch 'vost',
then false positives:
for the false positives looks like we can look for E##.(?:e\d\d.)
note to self:
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
because we are matching the languages only if they then match a specific tag/codec/set/etc reduces false positives.. its similar how nn does it in the trunk
@nivong , Bloodline Der Killer German 2011 AC3 DVDRiP XviD-XF
It sees german, but that is not enough , it also needs something else next to german, like year (2011). vs the old blacklist that only looks for ex. serbian : A.Serbian.Film.2010.DVDRip.XviD-BDMF - [01/68] - asf-bdmf-sample.avi
found some issues with the current regex, working on being able to properly test it within nn. stay tuned
got the test_blacklist fixed.. thanks l2g! still need to test some things before moving onto the other regexes.
if these are suppose to be case iterations of each other, there are a few differences between each string like spelling / things that are in one but not the other..
(100000, 'alt.binaries.', 'danish|deutsch|dutch|dksubs|flemish|french|hebrew|german|ita-eng|korsub|norwegian|serbian|spanish|spanisch|swedish|swesub|turkish|nl.?sub|.ita.|.japanese.', 1,1,0, 'Blocks non-english language releases.'), (100001, 'alt.binaries.', 'Danish|Deutsch|Dutch|DKsubs|Flemish|French|Hebrew|German|KorSub|Norwegian|Serbian|Spanish|Spanisch|Swedish|SweSUB|Turkish|.Japanese.', 1,1,0, 'Blocks non-english language releases.'), (100002, 'alt.binaries.*', 'DANiSH|DEUTSCH|DUTCH|DKSUBS|FLEMISH|FRENCH|HEBREW|GERMAN|KORSUB|NORWEGIAN|SERBIAN|SPANISH|SPANiSH|SWEDISH|SWEDiSH|SWESUB|TURKSIH|.GER|.JAPENESE.', 1,1,0, 'Blocks non-english language releases.'),