Open jcalifornia opened 3 years ago
Hi Josh,
interesting question. Short term solution is indeed to use replace many times. I do feel however, there should be a better and faster way for this. Would you mind tokenizing/splitting the strings first, would that work?
cheers,
Maarten
Hi there, I was wondering if there is a preferred way of 1) mapping words within phrases to other words defined in a potentially large dictionary 2) count-encoding said mapped words
So for instance if I have the following within a data column
and the following mapping
{ 'apple': 'fruit', 'banana': 'fruit', 'potato': vegetable, 'dog': 'animal', 'cat': animal, 'panda': 'animal', 'purple': 'color', 'green': 'color'}
I would want the following as a result of 1):
Actually, in my application, the words would be separated by commas and not spaces.
Then, if I wanted to count encode, I would have the following as a result of 2)
Is there an elegant way of doing this, or should I iterate through the set of values in the dictionary in combination with https://vaex.io/docs/api.html#vaex.expression.StringOperations.replace to accomplish this? Thanks