pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.51k stars 810 forks source link

error super(type, obj) with imdb.py after solving encoding 'utf-8' issue #226

Closed iamyihwa closed 4 years ago

iamyihwa commented 6 years ago

Hello, when loading the imdb dataset, since i am using python 3, have replaced the open to open(file, encoding = 'utf8'). However after that this error arises. I have no idea on how to solve this issue.


TypeError Traceback (most recent call last)

in () 1 IMDB_LABEL = data.Field(sequential=False) ----> 2 splits = torchtext.datasets.IMDB.splits(TEXT, IMDB_LABEL, 'data/') ~/anaconda3/envs/fastai/lib/python3.6/site-packages/torchtext/datasets/imdb.py in splits(cls, text_field, label_field, root, train, test, **kwargs) 50 Dataset. 51 """ ---> 52 return super(IMDB, cls).splits( 53 root=root, text_field=text_field, label_field=label_field, 54 train=train, validation=None, test=test, **kwargs) TypeError: super(type, obj): obj must be an instance or subtype of type
keon commented 6 years ago

Did you change any environment or code besides encoding='utf8'? If not, could you share your environment? I am having trouble reproducing this error.

iamyihwa commented 6 years ago

Hi, I didn't change other things other than that. Yesterday I have tried to replace super(IMDB, self) to super() (in two places, init and splits) and it seems to work. Is this the right way to do? I have no idea..

I used anaconda3 and python 3.6. I actually used torchtext under environment called fastai (it is a deep learning courseware and they have their own tools). However the place where error arises, it is a bit independent from the rest of the codes..

keitakurita commented 6 years ago

@iamyihwa Are you running the code in a Jupyter notebook and have not restarted the kernel? If so, there's a chance that your kernel is referencing the wrong IMDB dataset class when super(IMDB, self) is being called, causing an error.

shirishr commented 6 years ago

I am experiencing exact same errors. From a FastAi notebook a cell executes: splits = torchtext.datasets.IMDB.splits(TEXT, IMDB_LABEL, 'data/') This caused the encoding issue at line 32 of the imdb.py which was:

with open(fname, 'r') as f:

I changed this line to

with open(fname, ''r', encoding="utf-8") as f:

Thereafter I got same error that @iamyihwa got

TypeError: super(type, obj): obj must be an instance or subtype of type

I confirm that I had restarted the kernel.

Are we doing something improper by invoking a class method like: splits = torchtext.datasets.IMDB.splits(TEXT, IMDB_LABEL, 'data/') ????

shirishr commented 6 years ago

@iamyihwa and all others who may read this.

To fix this error, clone this repository i.e. https://github.com/pytorch/text

and install torchtext from here (python setup.py install --force)

It has updates that release 2.0.1 does not cover. (updates about encoding are not limited to imdb.py but involve dataset.py, field.py etc.)

I can can confirm that after this install all my errors encoding as well as TypeError: super(type, obj): obj must be an instance or subtype of type went away

pauloneves commented 6 years ago

I cloned the repository (master branch), forced install, restarted the kernel, but I still have the same errors. Changing the encoding got me to the super() error.

How do I guarantee that I'm running the updated version.

I'm in a Windows 10 machine. Also a Jupyter Notebook

UPDATE: I've just managed to fix it. It was necessary to uninstall the older version with a pip uninstall torchtext. The older version as installed directly in site-packages and was taking precedence to the newer one installed with python setup.py install. SOLVED!

zhangguanheng66 commented 4 years ago

Feel free to re-open the issue if you still have question.