Closed Enverex closed 8 years ago
Spotweb is a manual process, in which people check releases manually, name them, and distribute the NZB via the spotnet protocol. From memory, somebody built spotnet into newznab (a fork, I think?) but we don't support it currently.
This is a combo of two things: the first is that some indexers have picked up the NZB on Spotnet, and the other is that somebody's using a better regex to match the release. The original subject is trtk09833 - [00/12] - "AC-DC - '74 Jailbreak (1974)(flac).nzb" yEnc (1/1)
, so you could pretty easily write a regex to match those releases that throws away the trtk
part and uses the truncated filename instead (just AC-DC - '74 Jailbreak (1974)(flac)
).
I tried to find more info about how spotweb works, but there wasn't a ton. I found the old git repo and the wiki, but it didn't help me much. It does specify the groups and header information though.
We would need to index the headers from free.pt I guess? Digest and store them in some spotweb table?
On 9 April 2015 at 12:46, James Meneghello notifications@github.com wrote:
Spotweb is a manual process, in which people check releases manually, name them, and distribute the NZB via the spotnet protocol. From memory, somebody built spotnet into newznab (a fork, I think?) but we don't support it currently.
This is a combo of two things: the first is that some indexers have picked up the NZB on Spotnet, and the other is that somebody's using a better regex to match the release. The original subject is trtk09833 - [00/12] - "AC-DC - '74 Jailbreak (1974)(flac).nzb" yEnc (1/1), so you could pretty easily write a regex to match those releases that throws away the trtk part and uses the truncated filename instead (just AC-DC - '74 Jailbreak (1974)(flac)).
— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/191#issuecomment-91205696.
Regex wise, I assume something like this would be close..
/^trtk\d{4,8} - \[\d{1,5}\/\d{1,5}\] - "(?P<name>.+?)" yEnc \(\d{1,5}\/\d{1,5}\)$/i
But I can't think of a way to reliably get rid of the file extension, for example this would end up with...
All The... 60s (3-CD)(flac).nzb
and...
All The... 60s (3-CD)(flac).vol125+119.PAR2
and...
All The... 60s (3-CD)(flac).part12.rar
... etc. I can't match on "(flac)" as I can't be sure that's always there. Should I be matching the generic name or should I match the .nzb specifically? I'm not familiar with what I should actually be matching at this point so it's a bit confusing!
Here is the definition for spots: https://github.com/spotnet/spotnet/wiki/Spot-Xml-format
If I have some time I can probably put something together to put this into a table so we can scan later, similar to pres. Would probably need someone else to index the headers from free.pt, as I am not too familiar with that part!
On 9 April 2015 at 21:49, Benjamin Hodgetts notifications@github.com wrote:
Regex wise, I assume something like this would be close..
/^trtk\d{4,8} - [\d{1,5}\/\d{1,5}] - "(?P
.+?)" yEnc (\d{1,5}\/\d{1,5})$/i But I can't think of a way to reliably get rid of the file extension, for example this would end up with...
All The... 60s (3-CD)(flac).nzb
and...
All The... 60s (3-CD)(flac).vol125+119.PAR2
and...
All The... 60s (3-CD)(flac).part12.rar
... etc. I can't match on "(flac)" as I can't be sure that's always there. Or can I match the .nzb specifically? I'm not familiar with what I should actually be matching at this point so it's a bit confusing!
— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/191#issuecomment-91352849.
Ok, assuming I should be matching on the name alone and not the NZB file itself, this REGEX should be correct.
/^trtk\d{4,8} - \[\d{1,5}\/\d{1,5}\] - "(?P<name>.+?)\.(?:nzb|vol\d+(?:\+\d+){1,}?\.par2|part\d+\.rar|par2)" yEnc \(\d{1,5}\/\d{1,5}\)$/i
I've tested the regex against all the parts found in half a dozen or so of the "trtk" releases on Binsearch and it matched how I'd expect. How's it look to you Murodese?
Also, as this has made me realise some things are being wrongly renamed, would it be possible to have an "original_name" field in the database for releases as well? That way if a better way to rename something is found in future, it could be applied retroactively. Although it wouldn't help in this case as these will need a new regex applied to the actual release creation process, not just the name I guess.
I can add a binary_name
field, yeah.
Minor change to the regex to cope with rar\d{1,} which pops up occasionally, but otherwise it looks good:
^trtk\d{4,8} - \[\d{1,5}\/\d{1,5}\] - "(?P<name>.+?)\.(?:nzb|vol\d+(?:\+\d+){1,}?\.par2|part\d+\.rar|par2|r\d{1,})" yEnc \(\d{1,5}\/\d{1,5}\)$
Which group was this working in? Or several groups?
There's a spotnet client for django, but it's mixed with django code and therefore about 30x more complex than it needs to be. Shouldn't be too hard to write a scan step that directly derives nzbs from spotnet itself, but that'll take time that I don't have at the moment. We can pretty much just pull them from there and toss them straight into the releases table though - they're all manually verified, so we don't need to worry about a lot of stuff.
I was checking it against whatever groups the trtk's showed up in on BinSearch, I haven't implemented it as a regex in my tracker yet.
I'll make it generic then.
Run ./pynab.py update
and ./pynab.py regex
and it should update.
Eh...
python3 pynab.py regex
Traceback (most recent call last):
File "pynab.py", line 264, in <module>
update_regex()
File "pynab.py", line 195, in update_regex
pynab.util.update_regex()
File "/opt/pynab/pynab/util.py", line 59, in update_regex
revision = regex.search('\$Rev: (\d+) \$', first_line)
UnboundLocalError: local variable 'regex' referenced before assignment
Scoping irritates me sometimes.
Not quite...
Traceback (most recent call last):
File "pynab.py", line 264, in <module>
update_regex()
File "pynab.py", line 195, in update_regex
pynab.util.update_regex()
File "/opt/pynab/pynab/util.py", line 98, in update_regex
r = Regex(**reg)
TypeError: DeclarativeMeta object argument after ** must be a mapping, not int
That'll do it. Got the last unicode error blocking backfills, too.
Huzzah. Looks good here. Well done.
That leaves the question - is there any way of going back and getting the releases that would have been missed due to this Regex not previously existing?
EDIT: I'm now seeing this...
2015-04-10 02:21:09 INFO group: alt.binaries.mp3.bootlegs: scanning group
2015-04-10 02:21:09 INFO group: alt.binaries.aubergine: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.ufg: problem updating group (44409584-44409583)
2015-04-10 02:21:10 INFO group: alt.binaries.nirpaia: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.german.alt: problem updating group (102546574-102546573)
2015-04-10 02:21:10 INFO group: alt.binaries.mp3.bootlegs: nothing to do, already have target
2015-04-10 02:21:10 INFO group: alt.binaries.music.springsteen: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.sounds.mp3.german: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.aubergine: problem updating group (53219325-53219324)
2015-04-10 02:21:10 INFO group: alt.binaries.movies.martial.arts: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.nirpaia: problem updating group (23534777-23534776)
2015-04-10 02:21:10 ERROR group: alt.binaries.mac: problem updating group (9420016-9420015)
2015-04-10 02:21:10 INFO group: alt.binaries.sounds.mp3.french: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.series.tv.divx.french: problem updating group (113132517-113132516)
2015-04-10 02:21:10 INFO group: alt.binaries.sounds.audiobooks: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.moviereleases.nl: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.movies.martial.arts: nothing to do, already have target
2015-04-10 02:21:10 INFO group: alt.binaries.ibm-pc: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.sounds.mp3.french: nothing to do, already have target
2015-04-10 02:21:10 INFO group: alt.binaries.games.kidstuff.nl: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.music.springsteen: problem updating group (22290870-22290869)
2015-04-10 02:21:10 ERROR group: alt.binaries.sounds.mp3.german: problem updating group (8522799-8522798)
2015-04-10 02:21:10 INFO group: alt.binaries.mp3: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.startrek: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.moviereleases.nl: problem updating group (2833309-2833308)
2015-04-10 02:21:10 INFO group: alt.binaries.emulators.nintendo: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.games.kidstuff.nl: nothing to do, already have target
2015-04-10 02:21:10 ERROR group: alt.binaries.ibm-pc: problem updating group (1937010-1937009)
2015-04-10 02:21:10 INFO group: alt.binaries.department: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.dvd.repost: scanning group
If you know which group it was in, you can re-scan the group itself. If it's music, it shouldn't be too big. Duplicates will get automatically discarded, so shouldn't be a big deal.
Re-pull, I accidentally did something dumb. Coding at 7am does that D:~
Still seeing these...
2015-04-10 16:39:15 INFO release: [trtk09118]: added release (1 rars, 12 rarparts)
2015-04-10 16:39:16 INFO release: [trtk09115]: added release (1 rars, 11 rarparts)
2015-04-10 16:39:16 INFO release: [trtk09111]: added release (1 rars, 25 rarparts)
So not entirely sure that regex is working (is another one taking priority?)
Yeah, probably. Chuck one of those into a query and tell me the regex id associated with that release.
SELECT regex_id FROM releases WHERE name='trtk09115'
or such.
Oh, also the group names. If it's spread across lots of different groups it's not a problem, but even if they're all a.b.mp3.*, that's useful. We don't want to write regexes that are too broad.
This was the post - http://www.binsearch.ch/?b=Hits+In+Flac+Vol.116&g=alt.binaries.sounds.lossless&p=trein%40poster.eu+%28trein%29&max=250
It matched on Regex 591 - /^(RE: |)(?P
It was harder to find than I expeted as all the new ones had been wrongly renamed to something similar, but not quite right for the release ("Hits In Flac Vol.116.m3u" in this instance).
They appear to be in alt.binaries.sounds.mp3, alt.binaries.sounds.flac, alt.binaries.sounds.lossless and alt.binaries.sounds.lossless.classical (from what I've indexed so far).
I'd recommend matching on alt.binaries.sounds.* for the sake of completeness.
Yeah, forgot that I removed regex partial matching in favour of speed (NN+'s regex collection doesn't use them). So it'll match all groups, but the trtk bit is specific enough that it's not a problem.
Changed the ordinal so it should get processed first, let me know how it goes. You should just be able to run pynab.py regex
again to update it.
What's the easiest way to remove all releases from a group, basically to reset that group to default? Ideally I want to nuke those groups so that these releases will all be picked up properly for ones that have already been processed.
Find the group ids you want to reset and just issue a delete from releases on those group ids. The rest will get automatically cleaned up. Then ./pynab.py group reset (group) to reset the counters, from memory.
I'll add the original name stuff tomorrow, so you may want to hold off rescanning until then.
Will do. Thanks again for all the work on this.
This one needs ./pynab.py update
to run the alembic migration as well.
Interesting. Looks like some still aren't working...
2015-04-11 16:09:28 INFO rar: file info add [trtk13743 ]
2015-04-11 16:09:29 INFO rar: file info add [trtk13742 ]
2015-04-11 16:09:29 INFO rar: file info add [trtk13741 ]
2015-04-11 16:09:29 INFO rar: file info add [trtk13739 ]
2015-04-11 16:09:56 INFO rar: file info add [trtk13738 ]
2015-04-11 16:09:57 INFO rar: file info add [trtk13737 ]
2015-04-11 16:09:57 INFO rar: file info add [trtk13735 ]
2015-04-11 16:09:57 INFO rar: file info add [trtk13733 ]
2015-04-11 16:09:57 INFO rar: file info add [trtk13732 ]
2015-04-11 16:09:57 INFO rar: file info add [trtk13731 ]
I'll look into find out what's picking these up. Also nuking everything from those groups appears to have orphaned segments in the segments table so that's sitting a little high for now (it went into a loop saying it was going to process the backlog first, not making any new binaries and then saying it had to clear the backlog first. I've raised the post-process limit for now).
Looks like they are matching against Regex ID 1803 (ordinal 8).
/^(RE\:|)(ATTN.*?\>|)(ATTN|)(ART.*?\>|)(ARTWORK|)(?P<name>.*?\d{4}.*?)(\(|\[)(?P<parts>\d{1,3}\/\d{1,3})/i
An example (trtk13716) was - https://www.binsear.ch/?b=100+Hits+Legends+Aretha+Franklin&g=alt.binaries.sounds.mp3.complete_cd&p=yEncBin%40Poster.com+(trein1600)&max=250
I can't see why the new Regex didn't match it first here.
The segments are cleared when they're turned into a release, so I suspect that's unrelated.
Yeah, that regex has a really high ordinal. It'll be spammy as hell, but you can try printing out the regex order. pynab/binaries.py line 117 (before if reg.group_name...
), add log.debug('{}: {}'.format(part.subject, reg.id)
. That'll tell us what order they're being processed in.
Just found one matching against 591 as well (ordinal 2)...
/^(RE: |)(?P<name>\w.*?)( \- |)\[(?P<parts>\d{1,3}\/\d{1,3})\] \- \"/i
Basically it looks like they're matching against everything other than our new Regex. Are we sure that new Regex actually works?
I just manually tested it and it seems to match against all of the parts. Check that it exists? select * from regexes where id > 100000
.
Yeah, I can see it in the DB (I've been browsing around using phpPgAdmin to make it easier to see things at a glance).
It appears to be the 513th Regex to be processed. Regex ID 591 in comparison is the 548th to be checked, so the order seems to be right at least, but still our new Regex is not the one that ends up matching.
Oh, the regex matches the segment number as well, which gets stripped prior to binary processing. So no, the regex won't match. On 12 Apr 2015 12:14 am, "Benjamin Hodgetts" notifications@github.com wrote:
Yeah, I can see it in the DB (I've been browsing around using phpPgAdmin to make it easier to see things at a glance).
It appears to be the 513th Regex to be processed. Regex ID 591 in comparison is the 548th to be checked, so the order seems to be right at least, but still our new Regex is not the one that ends up matching.
— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/191#issuecomment-91872383.
Haha, that's why I said I wasn't sure if it was correct as I wasn't 100% sure what to match against in the first place.
Does it strip and trim? If so I guess it should be...
/^trtk\d{4,8} - \[\d{1,5}\/\d{1,5}\] - "(?P<name>.+?)\.(?:nzb|vol\d+(?:\+\d+){1,}?\.par2|part\d+\.rar|par2|r\d{1,})" yEnc$/i
I just noticed why it isn't processing binaries or releases.
2015-04-12 12:55:15,686 DEBG 'scan' stderr output:
Traceback (most recent call last):
File "/opt/pynab/scan.py", line 208, in <module>
main(mode=mode, group=arguments['<group>'], date=arguments['--date'])
File "/opt/pynab/scan.py", line 116, in main
process()
File "/opt/pynab/scan.py", line 78, in process
pynab.releases.process()
File "/opt/pynab/pynab/releases.py", line 217, in process
binary.parts[int(binary.total_parts / 2)].segments[0].size)
IndexError: list index out of range
Can you print the binary ID for the one that's breaking and pull the number of total_parts in the list?
I added...
log.debug('Checking: {} (name: {}) (parts: {})'.format(binary.id, binary.name, binary.total_parts))
... to pynab/releases.py on line 215 so I assume that should give me the correct read out. That in turn gave me...
2015-04-12 13:59:02 DEBUG Checking: 1865949 (name: Detective.Comics) (parts: 7)
Ok, now the parts:
SELECT * FROM parts WHERE binary_id = 1865949
35 rows but clearly not the same post.
"subject" "total_segments"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [2/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [5/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.vol03+4.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [6/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.vol07+8.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [7/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.vol15+5.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [3/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.vol00+1.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [4/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.vol01+2.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [1/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar"" yEnc" "29"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [6/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.vol07+8.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [5/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.vol03+4.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [3/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.vol00+1.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [4/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.vol01+2.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [7/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.vol15+5.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [1/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar"" yEnc" "37"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [2/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [2/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [5/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.vol03+4.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [6/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.vol07+8.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [7/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.vol15+5.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [3/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.vol00+1.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [4/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.vol01+2.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [1/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar"" yEnc" "27"
"Detective.Comics.Vol.1.No.830.May.2007.Comic.eBook-aAF - [4/7] - ""Detective.Comics.Vol.1.No.830.May.2007.Comic.eBook-aAF.rar.vol01+2.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.830.May.2007.Comic.eBook-aAF - [6/7] - ""Detective.Comics.Vol.1.No.830.May.2007.Comic.eBook-aAF.rar.vol07+8.par2"" yEnc" "1"
"Detective.Comics.Vol.2.No.9.Jul.2012.SCAN.Comic.eBook-iNTENSiTY - [3/7] - ""Detective.Comics.Vol.2.No.9.Jul.2012.SCAN.Comic.eBook-iNTENSiTY.rar.vol00+1.par2"" yEnc" "1"
"Detective.Comics.Vol.2.No.8.Jun.2012.SCAN.Comic.eBook-iNTENSiTY - [2/7] - ""Detective.Comics.Vol.2.No.8.Jun.2012.SCAN.Comic.eBook-iNTENSiTY.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [6/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.vol07+8.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [5/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.vol03+4.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [3/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.vol00+1.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [4/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.vol01+2.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [7/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.vol15+5.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [1/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar"" yEnc" "25"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [2/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF - [1/7] - ""Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF.rar"" yEnc" "27"
"Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF - [2/7] - ""Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF - [3/7] - ""Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF.rar.vol00+1.par2"" yEnc" "1"
I'm pretty sure that's one broke-ass regex. Can you pull the regex id from that binary?
Regex ID 679...
/^.*?\"(?P<name>.*?)\.(pdb|htm|prc|lit|epub|lrf|txt|pdf|rtf|doc|chf|chn|mobi|chm|doc|sample|mkv|Avi|mp4|vol|ogm|par|rar|sfv|nfo|nzb|srt|ass|mpg|txt|zip|wmv|ssa|r\d{1,3}|7z|tar|mov|divx|m2ts|rmvb|iso|dmg|sub|idx|rm|ac3|t\d{1,2}|u\d{1,3})/iS
Yeah, the regex is matching badly. I'll rewrite it.
How would that cause the scans to fail though? Perhaps I'm being daft but I'm not seeing the link.
I was halfway through a post on it and my PC hardlocked and I lost it :c I'll explain briefly:
The regex is causing binaries to be made improperly (it only matches Detective.Comics
without the volume numbers etc, meaning that we get the wrong parts put together). Because the wrong parts are together, it breaks stuff further down the line. A few of NN's regex are broken like this, which is why I made the facility to replace them as they're dragged down.
Fair enough, I had a feeling it was something along those lines but wasn't 100%.
I'd delete any binaries generated by that regex id and rescan them, by the way. Luckily they're in ebooks, so it's about 10 minutes to scan the whole goddamned group.
Those regex were broken for 4 different groups, though: a.b.ebook, a.b.e-book, a.b.e-book.technical, a.b.ebook.flood.
Yeah, already did that and the scan has kicked off successfully. I'll stop backfilling for now too to make sure the parts and binaries tables actually clear down.
I'll catch that exception anyway and make a log note to check the regex anytime it comes up.
I've seen a lot of releases named something like trtk09836 which in turn seems to get renamed wrongly, e.g.
Or doesn't get renamed at all:
Googling trtk09833 leads me to http://spotweb.timbo.nl/?page=getspot&messageid=5vwo1pBvvDQkNwsUwyILb%40spot.net which gives the impression there's some way of turning these release names into proper releases. It also seems to list other NZB sites that have somehow corrected the name to "AC-DC - '74 Jailbreak (1974)(flac)".
Any ideas how they are doing this and if it's something that could be implemented?