jamesmeneghello / pynab

Newznab-compliant Usenet Indexer written in Python, using PostgreSQL/MySQL-like.
Other
209 stars 44 forks source link

Improve Renaming for "trtkXXXXX" Style Releases #191

Closed Enverex closed 8 years ago

Enverex commented 9 years ago

I've seen a lot of releases named something like trtk09836 which in turn seems to get renamed wrongly, e.g.

2015-04-09 09:59:01 DEBUG category: () [trtk09836]: 8010
2015-04-09 09:59:01 DEBUG category: () [trtk09836]: 8010
2015-04-09 09:59:01 DEBUG category: () [63890ktrt]: 8010
2015-04-09 09:59:01 DEBUG category: () [CD2\01. Aretha Franklin - I Say A Little Prayer.flac]: 3040
2015-04-09 09:59:01 INFO release: [trtk09836] - rename: CD2\01. Aretha Franklin - I Say A Little Prayer.flac (8010 -> 8010 -> 3040)

Or doesn't get renamed at all:

2015-04-09 09:59:01 DEBUG category: () [trtk09833]: 8010
2015-04-09 09:59:01 DEBUG category: () [trtk09833]: 8010
2015-04-09 09:59:01 DEBUG category: () [33890ktrt]: 8010
2015-04-09 09:59:01 DEBUG category: () [ACDC - 74 Jailbreak - Back.jpg]: 8010
2015-04-09 09:59:01 DEBUG category: () ['74 Jailbreak (Australia).m3u]: 8010
2015-04-09 09:59:01 DEBUG category: () ['74 Jailbreak.CUE]: 8010
2015-04-09 09:59:01 DEBUG category: () [Folder.auCDtect.txt]: 8010
2015-04-09 09:59:01 DEBUG release: no good name candidates [trtk09833]

Googling trtk09833 leads me to http://spotweb.timbo.nl/?page=getspot&messageid=5vwo1pBvvDQkNwsUwyILb%40spot.net which gives the impression there's some way of turning these release names into proper releases. It also seems to list other NZB sites that have somehow corrected the name to "AC-DC - '74 Jailbreak (1974)(flac)".

Any ideas how they are doing this and if it's something that could be implemented?

jamesmeneghello commented 9 years ago

Spotweb is a manual process, in which people check releases manually, name them, and distribute the NZB via the spotnet protocol. From memory, somebody built spotnet into newznab (a fork, I think?) but we don't support it currently.

This is a combo of two things: the first is that some indexers have picked up the NZB on Spotnet, and the other is that somebody's using a better regex to match the release. The original subject is trtk09833 - [00/12] - "AC-DC - '74 Jailbreak (1974)(flac).nzb" yEnc (1/1), so you could pretty easily write a regex to match those releases that throws away the trtk part and uses the truncated filename instead (just AC-DC - '74 Jailbreak (1974)(flac)).

brookesy2 commented 9 years ago

I tried to find more info about how spotweb works, but there wasn't a ton. I found the old git repo and the wiki, but it didn't help me much. It does specify the groups and header information though.

We would need to index the headers from free.pt I guess? Digest and store them in some spotweb table?

On 9 April 2015 at 12:46, James Meneghello notifications@github.com wrote:

Spotweb is a manual process, in which people check releases manually, name them, and distribute the NZB via the spotnet protocol. From memory, somebody built spotnet into newznab (a fork, I think?) but we don't support it currently.

This is a combo of two things: the first is that some indexers have picked up the NZB on Spotnet, and the other is that somebody's using a better regex to match the release. The original subject is trtk09833 - [00/12] - "AC-DC - '74 Jailbreak (1974)(flac).nzb" yEnc (1/1), so you could pretty easily write a regex to match those releases that throws away the trtk part and uses the truncated filename instead (just AC-DC - '74 Jailbreak (1974)(flac)).

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/191#issuecomment-91205696.

Enverex commented 9 years ago

Regex wise, I assume something like this would be close..

/^trtk\d{4,8} - \[\d{1,5}\/\d{1,5}\] - "(?P<name>.+?)" yEnc \(\d{1,5}\/\d{1,5}\)$/i

But I can't think of a way to reliably get rid of the file extension, for example this would end up with...

All The... 60s (3-CD)(flac).nzb

and...

All The... 60s (3-CD)(flac).vol125+119.PAR2

and...

All The... 60s (3-CD)(flac).part12.rar

... etc. I can't match on "(flac)" as I can't be sure that's always there. Should I be matching the generic name or should I match the .nzb specifically? I'm not familiar with what I should actually be matching at this point so it's a bit confusing!

brookesy2 commented 9 years ago

Here is the definition for spots: https://github.com/spotnet/spotnet/wiki/Spot-Xml-format

If I have some time I can probably put something together to put this into a table so we can scan later, similar to pres. Would probably need someone else to index the headers from free.pt, as I am not too familiar with that part!

On 9 April 2015 at 21:49, Benjamin Hodgetts notifications@github.com wrote:

Regex wise, I assume something like this would be close..

/^trtk\d{4,8} - [\d{1,5}\/\d{1,5}] - "(?P.+?)" yEnc (\d{1,5}\/\d{1,5})$/i

But I can't think of a way to reliably get rid of the file extension, for example this would end up with...

All The... 60s (3-CD)(flac).nzb

and...

All The... 60s (3-CD)(flac).vol125+119.PAR2

and...

All The... 60s (3-CD)(flac).part12.rar

... etc. I can't match on "(flac)" as I can't be sure that's always there. Or can I match the .nzb specifically? I'm not familiar with what I should actually be matching at this point so it's a bit confusing!

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/191#issuecomment-91352849.

Enverex commented 9 years ago

Ok, assuming I should be matching on the name alone and not the NZB file itself, this REGEX should be correct.

/^trtk\d{4,8} - \[\d{1,5}\/\d{1,5}\] - "(?P<name>.+?)\.(?:nzb|vol\d+(?:\+\d+){1,}?\.par2|part\d+\.rar|par2)" yEnc \(\d{1,5}\/\d{1,5}\)$/i

I've tested the regex against all the parts found in half a dozen or so of the "trtk" releases on Binsearch and it matched how I'd expect. How's it look to you Murodese?

Also, as this has made me realise some things are being wrongly renamed, would it be possible to have an "original_name" field in the database for releases as well? That way if a better way to rename something is found in future, it could be applied retroactively. Although it wouldn't help in this case as these will need a new regex applied to the actual release creation process, not just the name I guess.

jamesmeneghello commented 9 years ago

I can add a binary_name field, yeah.

Minor change to the regex to cope with rar\d{1,} which pops up occasionally, but otherwise it looks good:

^trtk\d{4,8} - \[\d{1,5}\/\d{1,5}\] - "(?P<name>.+?)\.(?:nzb|vol\d+(?:\+\d+){1,}?\.par2|part\d+\.rar|par2|r\d{1,})" yEnc \(\d{1,5}\/\d{1,5}\)$

Which group was this working in? Or several groups?

jamesmeneghello commented 9 years ago

There's a spotnet client for django, but it's mixed with django code and therefore about 30x more complex than it needs to be. Shouldn't be too hard to write a scan step that directly derives nzbs from spotnet itself, but that'll take time that I don't have at the moment. We can pretty much just pull them from there and toss them straight into the releases table though - they're all manually verified, so we don't need to worry about a lot of stuff.

Enverex commented 9 years ago

I was checking it against whatever groups the trtk's showed up in on BinSearch, I haven't implemented it as a regex in my tracker yet.

jamesmeneghello commented 9 years ago

I'll make it generic then.

jamesmeneghello commented 9 years ago

Run ./pynab.py update and ./pynab.py regex and it should update.

Enverex commented 9 years ago

Eh...

python3 pynab.py regex
Traceback (most recent call last):
  File "pynab.py", line 264, in <module>
    update_regex()
  File "pynab.py", line 195, in update_regex
    pynab.util.update_regex()
  File "/opt/pynab/pynab/util.py", line 59, in update_regex
    revision = regex.search('\$Rev: (\d+) \$', first_line)
UnboundLocalError: local variable 'regex' referenced before assignment
jamesmeneghello commented 9 years ago

Scoping irritates me sometimes.

Enverex commented 9 years ago

Not quite...

Traceback (most recent call last):
  File "pynab.py", line 264, in <module>
    update_regex()
  File "pynab.py", line 195, in update_regex
    pynab.util.update_regex()
  File "/opt/pynab/pynab/util.py", line 98, in update_regex
    r = Regex(**reg)
TypeError: DeclarativeMeta object argument after ** must be a mapping, not int
jamesmeneghello commented 9 years ago

That'll do it. Got the last unicode error blocking backfills, too.

Enverex commented 9 years ago

Huzzah. Looks good here. Well done.

Enverex commented 9 years ago

That leaves the question - is there any way of going back and getting the releases that would have been missed due to this Regex not previously existing?

EDIT: I'm now seeing this...

2015-04-10 02:21:09 INFO group: alt.binaries.mp3.bootlegs: scanning group
2015-04-10 02:21:09 INFO group: alt.binaries.aubergine: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.ufg: problem updating group (44409584-44409583)
2015-04-10 02:21:10 INFO group: alt.binaries.nirpaia: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.german.alt: problem updating group (102546574-102546573)
2015-04-10 02:21:10 INFO group: alt.binaries.mp3.bootlegs: nothing to do, already have target
2015-04-10 02:21:10 INFO group: alt.binaries.music.springsteen: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.sounds.mp3.german: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.aubergine: problem updating group (53219325-53219324)
2015-04-10 02:21:10 INFO group: alt.binaries.movies.martial.arts: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.nirpaia: problem updating group (23534777-23534776)
2015-04-10 02:21:10 ERROR group: alt.binaries.mac: problem updating group (9420016-9420015)
2015-04-10 02:21:10 INFO group: alt.binaries.sounds.mp3.french: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.series.tv.divx.french: problem updating group (113132517-113132516)
2015-04-10 02:21:10 INFO group: alt.binaries.sounds.audiobooks: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.moviereleases.nl: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.movies.martial.arts: nothing to do, already have target
2015-04-10 02:21:10 INFO group: alt.binaries.ibm-pc: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.sounds.mp3.french: nothing to do, already have target
2015-04-10 02:21:10 INFO group: alt.binaries.games.kidstuff.nl: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.music.springsteen: problem updating group (22290870-22290869)
2015-04-10 02:21:10 ERROR group: alt.binaries.sounds.mp3.german: problem updating group (8522799-8522798)
2015-04-10 02:21:10 INFO group: alt.binaries.mp3: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.startrek: scanning group
2015-04-10 02:21:10 ERROR group: alt.binaries.moviereleases.nl: problem updating group (2833309-2833308)
2015-04-10 02:21:10 INFO group: alt.binaries.emulators.nintendo: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.games.kidstuff.nl: nothing to do, already have target
2015-04-10 02:21:10 ERROR group: alt.binaries.ibm-pc: problem updating group (1937010-1937009)
2015-04-10 02:21:10 INFO group: alt.binaries.department: scanning group
2015-04-10 02:21:10 INFO group: alt.binaries.dvd.repost: scanning group
jamesmeneghello commented 9 years ago

If you know which group it was in, you can re-scan the group itself. If it's music, it shouldn't be too big. Duplicates will get automatically discarded, so shouldn't be a big deal.

jamesmeneghello commented 9 years ago

Re-pull, I accidentally did something dumb. Coding at 7am does that D:~

Enverex commented 9 years ago

Still seeing these...

2015-04-10 16:39:15 INFO release: [trtk09118]: added release (1 rars, 12 rarparts)
2015-04-10 16:39:16 INFO release: [trtk09115]: added release (1 rars, 11 rarparts)
2015-04-10 16:39:16 INFO release: [trtk09111]: added release (1 rars, 25 rarparts)

So not entirely sure that regex is working (is another one taking priority?)

jamesmeneghello commented 9 years ago

Yeah, probably. Chuck one of those into a query and tell me the regex id associated with that release.

SELECT regex_id FROM releases WHERE name='trtk09115' or such.

jamesmeneghello commented 9 years ago

Oh, also the group names. If it's spread across lots of different groups it's not a problem, but even if they're all a.b.mp3.*, that's useful. We don't want to write regexes that are too broad.

Enverex commented 9 years ago

This was the post - http://www.binsearch.ch/?b=Hits+In+Flac+Vol.116&g=alt.binaries.sounds.lossless&p=trein%40poster.eu+%28trein%29&max=250

It matched on Regex 591 - /^(RE: |)(?P\w.*?)( - |)[(?P\d{1,3}\/\d{1,3})] - \"/i

It was harder to find than I expeted as all the new ones had been wrongly renamed to something similar, but not quite right for the release ("Hits In Flac Vol.116.m3u" in this instance).

They appear to be in alt.binaries.sounds.mp3, alt.binaries.sounds.flac, alt.binaries.sounds.lossless and alt.binaries.sounds.lossless.classical (from what I've indexed so far).

I'd recommend matching on alt.binaries.sounds.* for the sake of completeness.

jamesmeneghello commented 9 years ago

Yeah, forgot that I removed regex partial matching in favour of speed (NN+'s regex collection doesn't use them). So it'll match all groups, but the trtk bit is specific enough that it's not a problem.

Changed the ordinal so it should get processed first, let me know how it goes. You should just be able to run pynab.py regex again to update it.

Enverex commented 9 years ago

What's the easiest way to remove all releases from a group, basically to reset that group to default? Ideally I want to nuke those groups so that these releases will all be picked up properly for ones that have already been processed.

jamesmeneghello commented 9 years ago

Find the group ids you want to reset and just issue a delete from releases on those group ids. The rest will get automatically cleaned up. Then ./pynab.py group reset (group) to reset the counters, from memory.

I'll add the original name stuff tomorrow, so you may want to hold off rescanning until then.

Enverex commented 9 years ago

Will do. Thanks again for all the work on this.

jamesmeneghello commented 9 years ago

This one needs ./pynab.py update to run the alembic migration as well.

Enverex commented 9 years ago

Interesting. Looks like some still aren't working...

2015-04-11 16:09:28 INFO rar: file info add [trtk13743  ]
2015-04-11 16:09:29 INFO rar: file info add [trtk13742  ]
2015-04-11 16:09:29 INFO rar: file info add [trtk13741  ]
2015-04-11 16:09:29 INFO rar: file info add [trtk13739  ]
2015-04-11 16:09:56 INFO rar: file info add [trtk13738  ]
2015-04-11 16:09:57 INFO rar: file info add [trtk13737  ]
2015-04-11 16:09:57 INFO rar: file info add [trtk13735  ]
2015-04-11 16:09:57 INFO rar: file info add [trtk13733  ]
2015-04-11 16:09:57 INFO rar: file info add [trtk13732  ]
2015-04-11 16:09:57 INFO rar: file info add [trtk13731  ]

I'll look into find out what's picking these up. Also nuking everything from those groups appears to have orphaned segments in the segments table so that's sitting a little high for now (it went into a loop saying it was going to process the backlog first, not making any new binaries and then saying it had to clear the backlog first. I've raised the post-process limit for now).

Enverex commented 9 years ago

Looks like they are matching against Regex ID 1803 (ordinal 8).

/^(RE\:|)(ATTN.*?\>|)(ATTN|)(ART.*?\>|)(ARTWORK|)(?P<name>.*?\d{4}.*?)(\(|\[)(?P<parts>\d{1,3}\/\d{1,3})/i

An example (trtk13716) was - https://www.binsear.ch/?b=100+Hits+Legends+Aretha+Franklin&g=alt.binaries.sounds.mp3.complete_cd&p=yEncBin%40Poster.com+(trein1600)&max=250

I can't see why the new Regex didn't match it first here.

jamesmeneghello commented 9 years ago

The segments are cleared when they're turned into a release, so I suspect that's unrelated.

Yeah, that regex has a really high ordinal. It'll be spammy as hell, but you can try printing out the regex order. pynab/binaries.py line 117 (before if reg.group_name...), add log.debug('{}: {}'.format(part.subject, reg.id). That'll tell us what order they're being processed in.

Enverex commented 9 years ago

Just found one matching against 591 as well (ordinal 2)...

/^(RE: |)(?P<name>\w.*?)( \- |)\[(?P<parts>\d{1,3}\/\d{1,3})\] \- \"/i

Basically it looks like they're matching against everything other than our new Regex. Are we sure that new Regex actually works?

jamesmeneghello commented 9 years ago

I just manually tested it and it seems to match against all of the parts. Check that it exists? select * from regexes where id > 100000.

Enverex commented 9 years ago

Yeah, I can see it in the DB (I've been browsing around using phpPgAdmin to make it easier to see things at a glance).

It appears to be the 513th Regex to be processed. Regex ID 591 in comparison is the 548th to be checked, so the order seems to be right at least, but still our new Regex is not the one that ends up matching.

jamesmeneghello commented 9 years ago

Oh, the regex matches the segment number as well, which gets stripped prior to binary processing. So no, the regex won't match. On 12 Apr 2015 12:14 am, "Benjamin Hodgetts" notifications@github.com wrote:

Yeah, I can see it in the DB (I've been browsing around using phpPgAdmin to make it easier to see things at a glance).

It appears to be the 513th Regex to be processed. Regex ID 591 in comparison is the 548th to be checked, so the order seems to be right at least, but still our new Regex is not the one that ends up matching.

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/191#issuecomment-91872383.

Enverex commented 9 years ago

Haha, that's why I said I wasn't sure if it was correct as I wasn't 100% sure what to match against in the first place.

Enverex commented 9 years ago

Does it strip and trim? If so I guess it should be...

/^trtk\d{4,8} - \[\d{1,5}\/\d{1,5}\] - "(?P<name>.+?)\.(?:nzb|vol\d+(?:\+\d+){1,}?\.par2|part\d+\.rar|par2|r\d{1,})" yEnc$/i
Enverex commented 9 years ago

I just noticed why it isn't processing binaries or releases.

2015-04-12 12:55:15,686 DEBG 'scan' stderr output:
Traceback (most recent call last):
  File "/opt/pynab/scan.py", line 208, in <module>
    main(mode=mode, group=arguments['<group>'], date=arguments['--date'])
  File "/opt/pynab/scan.py", line 116, in main
    process()
  File "/opt/pynab/scan.py", line 78, in process
    pynab.releases.process()
  File "/opt/pynab/pynab/releases.py", line 217, in process
    binary.parts[int(binary.total_parts / 2)].segments[0].size)
IndexError: list index out of range
jamesmeneghello commented 9 years ago

Can you print the binary ID for the one that's breaking and pull the number of total_parts in the list?

Enverex commented 9 years ago

I added...

log.debug('Checking: {} (name: {}) (parts: {})'.format(binary.id, binary.name, binary.total_parts))

... to pynab/releases.py on line 215 so I assume that should give me the correct read out. That in turn gave me...

2015-04-12 13:59:02 DEBUG Checking: 1865949 (name: Detective.Comics) (parts: 7)
jamesmeneghello commented 9 years ago

Ok, now the parts:

SELECT * FROM parts WHERE binary_id = 1865949

Enverex commented 9 years ago

35 rows but clearly not the same post.

"subject"   "total_segments"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [2/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [5/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.vol03+4.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [6/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.vol07+8.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [7/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.vol15+5.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [3/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.vol00+1.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [4/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar.vol01+2.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF - [1/7] - ""Detective.Comics.Vol.1.No.815.Mar.2006.Comic.eBook-aAF.rar"" yEnc"  "29"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [6/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.vol07+8.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [5/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.vol03+4.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [3/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.vol00+1.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [4/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.vol01+2.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [7/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.vol15+5.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [1/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar"" yEnc"  "37"
"Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF - [2/7] - ""Detective.Comics.Vol.1.No.817.May.2006.Comic.eBook-aAF.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [2/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [5/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.vol03+4.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [6/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.vol07+8.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [7/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.vol15+5.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [3/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.vol00+1.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [4/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar.vol01+2.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY - [1/7] - ""Detective.Comics.Vol.1.No.833.Aug.2007.Comic.eBook-iNTENSiTY.rar"" yEnc"  "27"
"Detective.Comics.Vol.1.No.830.May.2007.Comic.eBook-aAF - [4/7] - ""Detective.Comics.Vol.1.No.830.May.2007.Comic.eBook-aAF.rar.vol01+2.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.830.May.2007.Comic.eBook-aAF - [6/7] - ""Detective.Comics.Vol.1.No.830.May.2007.Comic.eBook-aAF.rar.vol07+8.par2"" yEnc" "1"
"Detective.Comics.Vol.2.No.9.Jul.2012.SCAN.Comic.eBook-iNTENSiTY - [3/7] - ""Detective.Comics.Vol.2.No.9.Jul.2012.SCAN.Comic.eBook-iNTENSiTY.rar.vol00+1.par2"" yEnc"   "1"
"Detective.Comics.Vol.2.No.8.Jun.2012.SCAN.Comic.eBook-iNTENSiTY - [2/7] - ""Detective.Comics.Vol.2.No.8.Jun.2012.SCAN.Comic.eBook-iNTENSiTY.rar.par2"" yEnc"   "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [6/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.vol07+8.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [5/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.vol03+4.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [3/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.vol00+1.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [4/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.vol01+2.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [7/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.vol15+5.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [1/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar"" yEnc"  "25"
"Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY - [2/7] - ""Detective.Comics.Vol.1.No.849.Dec.2008.Comic.eBook-iNTENSiTY.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF - [1/7] - ""Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF.rar"" yEnc"  "27"
"Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF - [2/7] - ""Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF.rar.par2"" yEnc" "1"
"Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF - [3/7] - ""Detective.Comics.Vol.1.No.847.Oct.2008.Comic.eBook-aAF.rar.vol00+1.par2"" yEnc" "1"
jamesmeneghello commented 9 years ago

I'm pretty sure that's one broke-ass regex. Can you pull the regex id from that binary?

Enverex commented 9 years ago

Regex ID 679...

/^.*?\"(?P<name>.*?)\.(pdb|htm|prc|lit|epub|lrf|txt|pdf|rtf|doc|chf|chn|mobi|chm|doc|sample|mkv|Avi|mp4|vol|ogm|par|rar|sfv|nfo|nzb|srt|ass|mpg|txt|zip|wmv|ssa|r\d{1,3}|7z|tar|mov|divx|m2ts|rmvb|iso|dmg|sub|idx|rm|ac3|t\d{1,2}|u\d{1,3})/iS
jamesmeneghello commented 9 years ago

Yeah, the regex is matching badly. I'll rewrite it.

Enverex commented 9 years ago

How would that cause the scans to fail though? Perhaps I'm being daft but I'm not seeing the link.

jamesmeneghello commented 9 years ago

I was halfway through a post on it and my PC hardlocked and I lost it :c I'll explain briefly:

The regex is causing binaries to be made improperly (it only matches Detective.Comics without the volume numbers etc, meaning that we get the wrong parts put together). Because the wrong parts are together, it breaks stuff further down the line. A few of NN's regex are broken like this, which is why I made the facility to replace them as they're dragged down.

Enverex commented 9 years ago

Fair enough, I had a feeling it was something along those lines but wasn't 100%.

jamesmeneghello commented 9 years ago

I'd delete any binaries generated by that regex id and rescan them, by the way. Luckily they're in ebooks, so it's about 10 minutes to scan the whole goddamned group.

Those regex were broken for 4 different groups, though: a.b.ebook, a.b.e-book, a.b.e-book.technical, a.b.ebook.flood.

Enverex commented 9 years ago

Yeah, already did that and the scan has kicked off successfully. I'll stop backfilling for now too to make sure the parts and binaries tables actually clear down.

jamesmeneghello commented 9 years ago

I'll catch that exception anyway and make a log note to check the regex anytime it comes up.