A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Other
243
stars
37
forks
source link
[BUG] Not all granular features are getting generated #78
After running the notebook(s) on Kaggle/local machine we can see that not all granular features are getting generated for e.g. these fields 'repeated_letters_count', 'repeated_digits_count', 'repeated_spaces_count', 'repeated_whitespaces_count',
'repeated_punctuations_count', 'english_characters_count', 'non_english_characters_count' in addition to the others are not part of the dataframe, either it's not detected or something else is amiss.
NLP Profiler Version 0.0.3 - issue is not relevant to environment or any other technical parameter.
The version on the master branch also behaves in the same manner.
Describe the bug
After running the notebook(s) on Kaggle/local machine we can see that not all granular features are getting generated for e.g. these fields 'repeated_letters_count', 'repeated_digits_count', 'repeated_spaces_count', 'repeated_whitespaces_count', 'repeated_punctuations_count', 'english_characters_count', 'non_english_characters_count' in addition to the others are not part of the dataframe, either it's not detected or something else is amiss.
To Reproduce
Run the notebook on Kaggle i.e. https://www.kaggle.com/code/neomatrix369/nlp-profiler-simple-dataset and it fails at the cell that looks for repeat characters, etc...
Version information:
NLP Profiler Version 0.0.3 - issue is not relevant to environment or any other technical parameter. The version on the master branch also behaves in the same manner.
Additional context
From the logs on https://www.kaggle.com/code/neomatrix369/nlp-profiler-simple-dataset#Installation-and-import-libraries/packages - the 0.0.3 version on PyPi worked in the past and for some time has not been working.