pytorch / torcharrow

High performance model preprocessing library on PyTorch
https://pytorch.org/torcharrow/beta/index.html
BSD 3-Clause "New" or "Revised" License
649 stars 79 forks source link

Add vocab UDF from TorchText #287

Closed parmeet closed 2 years ago

parmeet commented 2 years ago

Adding Vocab UDF to TorchArrow

Usage example:

import torcharrow as ta
import torcharrow._torcharrow as _ta
from torcharrow import functional as F
tokens = ["<unk>", "Hello", "world", "How", "are", "you!"]
# 0 is the default index which is returned when OOV token is queried
vocab = _ta.Vocab(tokens, 0)
df = ta.dataframe(
    {
        "text": [["Hello", "world"], ["How", "are", "you!", "OOV"]]
    }
)
df["indices"] = F.lookup_indices(vocab, df["text"])
facebook-github-bot commented 2 years ago

@parmeet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@parmeet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@parmeet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

@parmeet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 years ago

This pull request was exported from Phabricator. Differential Revision: D35726386