pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.49k stars 813 forks source link

A question about sts(Semantic Text Similarity) data load #687

Open jwc19890114 opened 4 years ago

jwc19890114 commented 4 years ago

❓ Questions and Help

Description

i have a question when i tring to use torchtext in train DSSM (an sts model). I don't know how to construct the data. we have 2 lists (query list and doc list) and a label list to send in the model, how to use torchtext?

zhangguanheng66 commented 4 years ago

If you want to use DataLoader in torch.utils.data, you could write the dataset pipeline with the new abstraction. Copy/post most code in text classification datasets (link) should be enough. Depending on the format, you need to load the text data into memory. The text classification datasets come with label/text in a single line. Other than that, the data should be very similar.