pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.49k stars 813 forks source link

Confusing docs for build_vocab_from_iterator #2216

Open rmalouf opened 9 months ago

rmalouf commented 9 months ago

📚 Documentation

Description

In the docs for build_vocab_from_iterator:

https://pytorch.org/text/stable/vocab.html#build-vocab-from-iterator

the first arg is specified as an Iterable and the description says "Must yield list or iterator of tokens." It actually wants a list/iterator of lists/iterators of tokens. I guess it's expecting text divided into sentences or something. You can figure out what it's doing easily enough from either the example or the source, but it's unnecessarily confusing.