Add phrase counts or parts-of-speech token counts after extracting entities from a sentence

On the back of the PR #13, it appears there are other types of phrase i.e. pronouns, or dates or organisations etc... - the details can be discussed. So far we have achieved these and there are a number of others to cover:

Name entity recognition features:

[ ] PERSON | People, including fictional.
[ ] NORP | Nationalities or religious or political groups.
[ ] FAC | Buildings, airports, highways, bridges, etc.
[ ] ORG | Companies, agencies, institutions, etc.
[ ] GPE | Countries, cities, states.
[ ] LOC | Non-GPE locations, mountain ranges, bodies of water.
[ ] PRODUCT | Objects, vehicles, foods, etc. (Not services.)
[ ] EVENT | Named hurricanes, battles, wars, sports events, etc.
[ ] WORK_OF_ART | Titles of books, songs, etc.
[ ] LAW | Named documents made into laws.
[ ] LANGUAGE | Any named language. (related to #4 feature request)
[ ] DATE | Absolute or relative dates or periods.
[ ] TIME | Times smaller than a day.
[ ] PERCENT | Percentage, including ”%“.
[ ] MONEY | Monetary values, including unit.
[ ] QUANTITY | Measurements, as of weight or distance.
[ ] ORDINAL | “first”, “second”, etc.
[ ] CARDINAL | Numerals that do not fall under another type.

Parts of speech features:

[X] (NOUN | noun | girl, cat, tree, air, beauty) Noun phrase count via #13 by @ritikjain51 and #47
[ ] ADJ | adjective | big, old, green, incomprehensible, first
[ ] ADP | adposition | in, to, during
[ ] ADV | adverb | very, tomorrow, down, where, there
[ ] AUX | auxiliary | is, has (done), will (do), should (do)
[ ] CONJ | conjunction | and, or, but
[ ] CCONJ | coordinating conjunction | and, or, but
[ ] DET | determiner | a, an, the
[ ] INTJ | interjection | psst, ouch, bravo, hello
[ ] NUM | numeral | 1, 2017, one, seventy-seven, IV, MMXIV
[ ] PART | particle | ’s, not,
[ ] PRON | pronoun | I, you, he, she, myself, themselves, somebody
[ ] PROPN | proper noun | Mary, John, London, NATO, HBO
[ ] PUNCT | punctuation | ., (, ), ?
[ ] SCONJ | subordinating conjunction | if, while, that
[ ] SYM | symbol | $, %, §, ©, +, −, ×, ÷, =, :), 😝
[ ] VERB | verb | run, runs, running, eat, ate, eating
[ ] SPACE | space

See https://spacy.io/api/annotation#section-named-entities and http://www.nltk.org/book/ for details on the above items.

We will replace one or more existing functionalities in the libraries with the above, case-by-case basis. It would be best to group each of them and give them unique names like name-entity-recognition-features and parts-of-speech-features, respectively and club them with granular features.

Both NLTK and Spacey would be used to fulfill these functionalities.

neomatrix369 / nlp_profiler

Add phrase counts or parts-of-speech token counts after extracting entities from a sentence #15