pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.5k stars 815 forks source link

How to use fasttext emebddings in the torchtext Nightly Vocab #1264

Open StephennFernandes opened 3 years ago

StephennFernandes commented 3 years ago

I have a custom trained facebook fasttext embedding which i want to use in my RNN.

i use the nightly version of torchtext so the Vocab is kinda new. How do i use fastext embedding there. a simple clear example would be great.

zhangguanheng66 commented 3 years ago

We have fasttext embedding in torchtext.experimental.vectors

StephennFernandes commented 3 years ago

And can i use that to load my custom embedding file ?

StephennFernandes commented 3 years ago

@zhangguanheng66 Hey man, Btw i still wasn't able to implement the Language modeling pipeline for my large .txt corpus. I tried all the methods you suggested,

Is there any clear and detailed explaiantion to implement LM using torchtext nightly with the latest techniques. current evertyhing feels clunky and unpolished in torchtext.

When is torchtext getting a big rewrite ?

on that note please help me out with a clear and detailed explaiantion on how to use torchtext a build a LM DataLoader

parmeet commented 3 years ago

@StephennFernandes Thank you for your feedback! It would be great if you can provide more details on your pain points. Please do not hesitate to create Issue or Feature requests and we can take it forward from there. Here are the release notes for the latest torchtext that might provide some guidance as well https://github.com/pytorch/text/releases/tag/v0.9.0-rc5

Regarding LM, we do have a tutorial https://pytorch.org/tutorials/beginner/transformer_tutorial.html. We would be happy to hear your feedback and can try to provide improvements or recommendations based on that.