sgrieve / mp-hack

CW24 hackday project to identify educational background of MPs
MIT License
1 stars 0 forks source link

Find a method to extract degree names from sentences #4

Open sgrieve opened 5 months ago

sgrieve commented 5 months ago

Current data is structured as sentence fragments:

"He then studied modern history at Magdalen College, Oxford.Following graduation, Chalk obtained a Graduate Diploma in Law with distinction from the City University London, and qualified as a barrister from the Inns of Court School of Law."

We need to be able to identify the subjects being mentioned: eg Law, Modern History in order to populate rows in our dataset.

sgrieve commented 5 months ago

Made some progress on this, and have added relevant data in 3d8add8.

Lots of duplicate matches that need to be filtered out and it is challenging to discern between true and false negatives without lots of manual checking.