recommeddit / labs

ML/data experiments for Recommeddit
MIT License
1 stars 0 forks source link

Entity deduping #9

Closed SwiftWinds closed 2 years ago

SwiftWinds commented 2 years ago

Research methods of dealing with people recommending the same thing but written slightly different (e.g., hard one: "vscode" and "Visual Studio Code" or an easier one: "huion kamvas" and "huion kamvas 13") and discuss with team the methods used and which might be best to use (e.g., pros and cons of each one)

Gopu2001 commented 2 years ago
#!/usr/bin/env python3

import requests
import urllib.parse
from pprint import pprint

params = {
    "action" : "query",
    "format" : "json",
    "prop" : "info",
    "list" : "search",
    "srsearch" : "wikipedia",
    "srlimit" : 3,
    "srprop" : "sectiontitle"
}
for i in range(10):
    params["srsearch"] = input("Search Wikipedia: ")
    pprint(requests.get('https://en.wikipedia.org/w/api.php?' + urllib.parse.urlencode(params, doseq=True)).json())

This is some example code that I came up with. We can try doing something like this. Comes in handy if the products we are looking for are at least somewhat well known, so it should work for most cases.

Gopu2001 commented 2 years ago

Just committed a function defined using Wikipedia's API. Check the following commit for the additions: d01312a