sumwmer / INF558Project

0 stars 1 forks source link

Break down project into smaller tasks #2

Open sumwmer opened 4 years ago

sumwmer commented 4 years ago
  1. Data extraction
  2. Build KG (entity linking, resolution etc)
  3. Build similarity edges
sumwmer commented 4 years ago

POC Glassdoor data @Myunghee13 POC LinkedIn data @summer7xinting due 02/26

Myunghee13 commented 4 years ago

Data(job postings) extraction & cleaning: March 11 Ontology : during spring break Entity resolution: position & company: April 1st Generate triples(clean up the graph): April 8 Build similarity edges: April 23

Myunghee13 commented 4 years ago

3 data set 1) Glassdoor (about 2,000 jobs records) all data scientist and software engineer jobs in the US provided by Glassdoor.

2) Linkedin

3) Wikidata (US companies records: about 19,000) Wikidata has over 19,000 company records, each of which has information (e.g. founder, owner, parent organization). e.g. https://www.wikidata.org/wiki/Q27415

Myunghee13 commented 4 years ago

Entity Resolution Plan 1) link job positions between Glassdoor and Linkedin (main key: company) 2) link companies of 1) with companies of Wikidata