A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Related to the performance issue raised via #2 - comments and other metrics can also be founds on #2
Refactored tests
Performance test added (more to come), including profiling of a slow function (i.e. spell checker feature)
Refactored core into its individual modules
Extracted constants into its own module
Two different kinds of parallelisation methods now available via the params flag i.e. 'default' or 'using_swifter' (usage example provided in the notebooks folder)
Additional notebook provided with examples on how to use with large datasets
caching applied to different aspects of the parsing process
Related to the performance issue raised via #2 - comments and other metrics can also be founds on #2
spell checker
feature)params
flag i.e. 'default' or 'using_swifter' (usage example provided in thenotebooks
folder)