topfs2 / heimdall

Metadata extraction engine
GNU General Public License v2.0
15 stars 3 forks source link

Remove regexp in require #3

Open topfs2 opened 11 years ago

topfs2 commented 11 years ago

Currently the system allows for quite advanced demand.require through regexp. So far these have only been used for checking if: a) the object is exactly X b) the object begins with X

So, simply remove regex and use those two mechanics instead. Will simplify the scheduling alot

garbear commented 11 years ago

What about demand disjunction? when you consider that regexes add propositional logic for free, it seems that you're actually getting a pretty sweet deal.

As a concrete example, a scraper parses gameboy / gbc / super gameboy ROM headers. They all use address $134 for ROM title, so you would probably want a single function for this. demand = ["^(Super )?Game Boy( Color?)$"] or ["^(Super Game Boy|Game Boy|Game Boy Color)$"] is more succinct for both the user and the engine than

demandAny = [
    demand = ["Game Boy"]
    demand = ["Super Game Boy"]
    demand = ["Game Boy Color"]
]
demand = ...

btw that brings something to my attention, demand.require("Game Boy") will match "Super Gamy Boy", no? It would have to be "^Game Boy$"?

topfs2 commented 11 years ago

The biggest reason I'd want to remove it is that its incredibly hard to schedule. For example if you have another module which has supply = [ "(([Ss]uper)? [Gg]ame [Bb]oy|[Ww]ii(\s[Uu])?)" ]. This is obviously a rather stupid supply but its incredibly hard knowing that your demand is connected to this supply.

With begins/ends and exactly and perhaps case insensitiveness the connection between these are much much easier to find!

The only way to find the connection with regexps afaik is to build the two state machines and see if we can find an intersection between them. Both of these algorithms are far far from trivial. I found some libs able to do it but it IIRC but they didn't include anywhere near entire regexp pearl standard.

topfs2 commented 11 years ago

Oh, and right now the scheduling is actually only done on the edge, not on the object. With the exception of class, which is what makes it work for now. But to be able to actually purge tasks on general properties this would be a very needed feature.