molybdenum-99 / infoboxer

Wikipedia information extraction library
MIT License
174 stars 16 forks source link

Fails to parse these specific search results #89

Closed Nakilon closed 3 years ago

Nakilon commented 3 years ago

I would love to replace mediawiki-butt with infoboxer but in this test it is unable to parse search results:

query = "created by Stack Exchange users"
require "mediawiki-butt"
butt = MediaWiki::Butt.new "https://esolangs.org/w/api.php"
result = butt.get_search_results query
result = butt.get_search_text_results query if result.empty?
p result
p "https:" + URI.escape(URI.escape(butt.get_article_path result.first), "?") unless result.empty?
["???", "List of ideas"]
"https://esolangs.org/wiki/%3F%3F%3F"
require "infoboxer"
p Infoboxer.wiki("https://esolangs.org/w/api.php").search(query, limit: 1).first
nil

(also here it's 2-3 times slower than butt, maybe making more requests, idk)

zverok commented 3 years ago

By default, Infoboxer's search looks only in titles (I am not sure it is sensible behavior actually, but that's how it is now). You can hack it around by adjusting the request:

Infoboxer.wiki("https://esolangs.org/w/api.php").search(query, limit: 1) { |req| req.what(:text) }
# => [#<Page(title: "???", url: "https://esolangs.org/wiki/%3F%3F%3F"): ??? is an esoteric programming ...>]

(req inside the block is MediaWiktory::Wikipedia::Actions::Query, and on search, this module is available for additional tweaking)

also here it's 2-3 times slower than butt, maybe making more requests

It fetches some meta-info about MediaWiki instance on wiki object instantiation, so if you'll do this:

wiki = Infoboxer.wiki("https://esolangs.org/w/api.php") # meta-info fetching

wiki.search(...) # reusing the object

...it'll probably help.

Nakilon commented 3 years ago

The order is weird though.

image

> wiki.search("nakilon", limit: 1){ |req| req.what :text }.first.title
=> "Velik"
> wiki.search("nakilon", limit: 2){ |req| req.what :text }.first.title
=> "RASEL"
> wiki.search("nakilon", limit: 3){ |req| req.what :text }.first.title
=> "RASEL"
> wiki.search("nakilon", limit: 2){ |req| req.what :text }.map &:title
=> ["RASEL", "Velik"]

P.S.: I already have the separate "meta-info fetching" line so there is probably something else that takes time.

zverok commented 3 years ago

The order is weird though.

That's what their API do :shrug:

Nakilon commented 3 years ago

Alright, thanks for the help. Now I fully switched to your gem: https://github.com/Nakilon/nakiircbot/commit/e35ce0783fbc27072f1c829cc2d80835a00c010f