matthias-samwald / find-me-evidence

An open-source medical search engine
GNU Affero General Public License v3.0
9 stars 1 forks source link

faulty regex in wikipedia_create_list_of_relevant_articles.php #18

Closed gpetz closed 10 years ago

gpetz commented 10 years ago

The regex in wikipedia_create_list_of_relevant_articles.php isn't extracting all Wikipedia articles: e.g. only 737 for http://tools.wmflabs.org/enwp10/cgi-bin/list2.fcgi?run=yes&projecta=Pharmacology&namespace=&pagename=&quality=&importance=&score=&limit=1000&offset=4001&sorta=Importance&sortb=Quality instead of 1000. I recommend to use http://simplehtmldom.sourceforge.net/ instead. This bug will be closed immediately. I just want to document it.

gpetz commented 10 years ago

ok, the regex is not faulty, it's the function filter_urls($url) that I overlooked