Open kolloldas opened 6 years ago
Yes, I think this works. The other way to do it would be to pass fromlist
a list where the relevant column appears twice, something that I believe is supported by the TSV/CSV and/or JSON loaders (I think we did that for the SNLI dataset class?) But adding something like this would make things more convenient. If you make a PR before I get to it, can you also update TabularDataset
to use this code path?
Sure, will update TabularDataset
as required when I make the PR. I think any calls that end in Example.fromdict
will support multiple fields. So I'll check for CSV/TSV without headers which use Example.fromlist
Isn't it already merged https://github.com/pytorch/text/pull/222?
First of all thanks for this great time saver of a library!
I tried using
SequenceTaggingDataset
for the Conll2003 NER task which needs both word and character embeddings for high accuracy. The newly addedNestedField
works very well but there doesn't seem to be a way to currently apply multiple fields to a single data column. I needed something like:So I made a few modifications. Specifically in example.py:
And in dataset.py:
It seems to work fine but is there a better way to do this? Did I miss something?
Thanks!