pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.49k stars 815 forks source link

load dialogue datasets with torchtext #673

Open mmsamiei opened 4 years ago

mmsamiei commented 4 years ago

❓ How to load Dialogue Datasets?

Description I have dialogue task dataset in the form of blow:

[{"chosen_topic": "Science fiction", "dialog": [ { "retrieved_passages": [ { "Hyperspace (science fiction)": [ ...., .... ] }, { ... }] "speaker": "0_Wizard", "text": "I think science fiction is an amazing genre for anything. Future science, technology, time travel, FTL travel, they're all such interesting concepts." }, {

    }
    ]

}]

we have some dialogues that each of them has several turns and each turn has some facts(sentences). I tried hard to find a way to make a data loader with Torchtext for it, but because in torch text we can't have nest nested field, it couldn't be possible. so is there any way to load dialogue datasets (and another one that are multinested!) with Torchtext?

zhangguanheng66 commented 4 years ago

for the text items, you could use tokenizers/vocab in torchtext to process the raw text data.