tboothman / imdbphp

PHP library for retrieving film and tv information from IMDb
253 stars 84 forks source link

Empty release info #149

Closed alr2413 closed 6 years ago

alr2413 commented 6 years ago

Hi, It seems that the method "releaseInfo()" always returns an empty array. Please check it. Thanks.

tboothman commented 6 years ago

Ah nice, I was a bit of the way through converting this to use domdocument because that regular expression is pretty disgusting

jreklund commented 6 years ago

Oh... my bad. That would be much nicer indeed. @tboothman Should we start utilizing "Assignees" when we are grabbing the problem from now on?

tboothman commented 6 years ago

Not a problem really. I wanted to make it better but making it work was more important. I just incorporated what you'd done into my rewrite - which was really just the bit that extracts the table cells from the DOM.

2d31db7029b4f7f7d83b05db258200a5a21b36fa

I really wanted to add the helper as a function but you can't autoload functions in PHP :( I'd spotted 'functional php' which seemed to let you use functions without including them but actually what it does is gets composer to load all of its files in advance https://github.com/lstrojny/functional-php/blob/master/composer.json which is pretty sad. So .. distracted myself agonising over functions vs static methods for a while.

duck7000 commented 6 years ago

Those regular expressions are in almost all method really disgusting, so converting to domdocument is not a bad idea. It produces much more readable code Iv'e tried it once but i got the feeling that it is slower than reg expressions?

tboothman commented 6 years ago

It is slower. The one in the tests, the AKAs for jurassic park is 200KB of html which will take a while to parse. The old code took about 0.2ms and the new takes about 5ms. Most of that is spent creating the htmldom (~4ms), with the rest on the xpath.

While making it 20x slower sounds terrible, in the context of it taking ~1s to download that html in the first place it's not that bad. In the case where someone's using this library as a database even if they're making 10 requests for data and every method takes 5ms (it doesn't, and it wouldn't. If the title page started using this I'd cache the domdocument) that's still only 50ms, which isn't too bad.