Apparently, DOMDocument::loadHTML() needs the encoding, and str_word_count() doesn't work for unicode. I fixed the first, and included a replacement for str_word_count(). This isn't necessarily the correct solution for word count, just one that somewhat works (see links).
included a @todo - load only actual page content. right now this will also load stuff like the "protectedpagewarning" message, which might trigger some of the scorers in the future.
Apparently, DOMDocument::loadHTML() needs the encoding, and str_word_count() doesn't work for unicode. I fixed the first, and included a replacement for str_word_count(). This isn't necessarily the correct solution for word count, just one that somewhat works (see links).
included a @todo - load only actual page content. right now this will also load stuff like the "protectedpagewarning" message, which might trigger some of the scorers in the future.