This PR adds a notebook which demonstrates using truecase and spacy to perform natural language processing on committee names. Spacy is capable of extracting 'named entities' and recognizing named people. The notebook contained here demonstrates first applying true case to the committee name (because spacy will not work on the all caps names), then using spacy to extract names. Some errors do occur, but it works nicely for many of the committees.
I wanted to offer this as a potential supplement to @jumptable's solution of using regular expressions, and also to leave it in mind as an approach for future problems. I've borrowed @jumptable's approach for creating an environment and downloading dependencies (copied Makefile and .txt) from his pull request.
This PR adds a notebook which demonstrates using truecase and spacy to perform natural language processing on committee names. Spacy is capable of extracting 'named entities' and recognizing named people. The notebook contained here demonstrates first applying true case to the committee name (because spacy will not work on the all caps names), then using spacy to extract names. Some errors do occur, but it works nicely for many of the committees.
I wanted to offer this as a potential supplement to @jumptable's solution of using regular expressions, and also to leave it in mind as an approach for future problems. I've borrowed @jumptable's approach for creating an environment and downloading dependencies (copied
Makefile
and.txt
) from his pull request.