Open aliforgetti opened 3 years ago
Hi, could you paste the actual data you're using? (Just one of the texts would help probably).
For me with the beginning of your first text, the punctuation is removed successfully:
>>> import texthero as hero
>>> import pandas as pd
>>> s = pd.Series(["Honestly people don't know about the fact ..."])
>>> hero.clean(s)
0 honestly people know fact
dtype: object
The issue is probably that some punctuation in your text is not "standard" punctuation (texthero internally uses import string; string.punctuation
so if it's not in there it won't be removed
Thank you @henrifroese. @aliforgetti do you have any updates?
This is my code and I was trying to clean a large dataset
According to the documentation this is the default pipeline for the
clean
functionality:But my ouput does not reflect this as some of the punctuation remained in the text.
Original text column
Preprocessed text column