jaytimm / corpuslingr

A library of functions enabling complex corpus search in context (KWIC), search aggregation, bag-of-words building & keyphrase extraction.
21 stars 1 forks source link

difference with just plain regex? #1

Closed randomgambit closed 6 years ago

randomgambit commented 6 years ago

Hello Jay,

Thanks for this nice package. Sorry but I am a bit unexperienced in NLP and I do not quite get the difference between running some Part-of-Speech parsing using udpipe + some regex, and your function corpuslingr::clr_search_gramx.

Is there something more that I am missing here?

Thanks!

jaytimm commented 6 years ago

Hi @randomgambit . Thanks for the interest. Sorry for the slow response. The clr_search_gramx function streamlines search for lexical/grammatical patterns occurring across multiple annotation features (eg, lemma, token, and part-of-speech). It is regex based, but is supplemented with a simple "corpus querying language" to make search easier. Before using clr_search_gramx, an annotated corpus (via udpipe, eg) needs to be amended some using the clr_set_corpus function.

Some_annotated_corpus %>% clr_set_corpus() %>% clr_search_gramx (search = “ADJ (like a)? NOUN”)

Additional search examples can be viewed here. Let me know.