Open sumwmer opened 4 years ago
POC Glassdoor data @Myunghee13 POC LinkedIn data @summer7xinting due 02/26
Data(job postings) extraction & cleaning: March 11 Ontology : during spring break Entity resolution: position & company: April 1st Generate triples(clean up the graph): April 8 Build similarity edges: April 23
3 data set 1) Glassdoor (about 2,000 jobs records) all data scientist and software engineer jobs in the US provided by Glassdoor.
2) Linkedin
3) Wikidata (US companies records: about 19,000) Wikidata has over 19,000 company records, each of which has information (e.g. founder, owner, parent organization). e.g. https://www.wikidata.org/wiki/Q27415
Entity Resolution Plan 1) link job positions between Glassdoor and Linkedin (main key: company) 2) link companies of 1) with companies of Wikidata