graph4ai / graph4nlp

Graph4nlp is the library for the easy use of Graph Neural Networks for NLP. Welcome to visit our DLG4NLP website (https://dlg4nlp.github.io/index.html) for various learning resources!
Apache License 2.0
1.67k stars 201 forks source link

Error while running NER #321

Closed yogeshhk closed 3 years ago

yogeshhk commented 3 years ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Run python examples/pytorch/name_entity_recognition/main.py --graph_type dependency_graph --gpu 0 --init_hidden_size 400 --hidden_size 128 --lr 0.01 --batch_size 100 --gnn_type graphsage --direction_option undirected

  2. Getting TypeError: Can't instantiate abstract class ConllDataset with abstract methods download

Expected behavior

Environment

Additional context

AlanSwift commented 3 years ago

Thank you for giving us feedback. We will check this issue asap. @xguo7 will follow up it.

AlanSwift commented 3 years ago

Could you please check whether the raw data exists in your computer (Please refer to https://github.com/graph4ai/graph4nlp/tree/master/examples/pytorch/name_entity_recognition/conll/raw)? Currently, the download function is not implemented, and the raw data should be downloaded in the repo. (The download function will be implemented in the future version.) We are sorry for the inconvenience.

yogeshhk commented 3 years ago

I see 3 files there: eng.train eng.testa eng.testb

Files and content look fine with IOB data

AlanSwift commented 3 years ago

We have conducted several tests on different computers with Windows 10 system and can't reproduce this problem. May I ask under what path did you execute this command?

yogeshhk commented 3 years ago

At root of graph4nlp folder...which has been forked-cloned...from that path, the Text classifier examples work..Here is the call stack

(graph4nlp) graph4nlp>python examples/pytorch/name_entity_recognition/main.py --graph_type dependency_graph --gpu 0 --init_hidden_size 400 --hidden_size 128 --lr 0.01 --batch_size 100 --gnn_type graphsage --direction_option undirected
Using backend: pytorch
starting build the dataset
Traceback (most recent call last):
  File "examples/pytorch/name_entity_recognition/main.py", line 547, in <module>
    runner = Conll()
  File "examples/pytorch/name_entity_recognition/main.py", line 319, in __init__
    self._build_dataloader()
  File "examples/pytorch/name_entity_recognition/main.py", line 342, in _build_dataloader
    tag_types=self.tag_types)
  File "C:\Users\yogesh.kulkarni\AppData\Local\Continuum\anaconda3\envs\graph4nlp\lib\typing.py", line 1231, in __new__
    return _generic_new(cls.__next_in_mro__, cls, *args, **kwds)
  File "C:\Users\yogesh.kulkarni\AppData\Local\Continuum\anaconda3\envs\graph4nlp\lib\typing.py", line 1186, in _generic_new
    return base_cls.__new__(cls)
TypeError: Can't instantiate abstract class ConllDataset with abstract methods download
AlanSwift commented 3 years ago

This looks weird. Could you please add the following code import os print("The raw data's path is", self.raw_dir) print(os.path.exists(self.raw_dir)) after https://github.com/graph4ai/graph4nlp/blob/9e1e3b5b83362ab4d8f14b06f7d8dcccc4662cc6/graph4nlp/pytorch/data/dataset.py#L393

and see whether the raw data exists?

yogeshhk commented 3 years ago

Its not hitting there....Let me debug further and I will keep you posted

yogeshhk commented 3 years ago

in conll.py


    def download(self):
        print("The raw data's path is", self.raw_dir)
        print(os.path.exists(self.raw_dir))
       # raise NotImplementedError(
       #     'This dataset is now under test and cannot be downloaded. Please prepare the raw data yourself.')

Made it to work...but still I am not sure if this is good change...I will debug this more.

AlanSwift commented 3 years ago

in conll.py


    def download(self):
        print("The raw data's path is", self.raw_dir)
        print(os.path.exists(self.raw_dir))
       # raise NotImplementedError(
       #     'This dataset is now under test and cannot be downloaded. Please prepare the raw data yourself.')

Made it to work...but still I am not sure if this is good change...I will debug this more.

Actually, this function will not be executed. So I guess there must be some fault. Since we can't reproduce this problem, I suggest you debug it more. Thank you!

SaizhuoWang commented 3 years ago

To make it clearer, when instantiating a Dataset (in this case the ConllDataset), the library will check if the raw data are present in the environment, in this case the raw directory and the contents in it, which is specified in the raw_file_names property. If the raw data is not present, the download method will be called to download the raw data. In this case the download method is not implemented by ConllDataset, which means the raw data must be present as the GitHub repo does. Otherwise, the NotImplementedError is raised owing to an abstract method call.

AlanSwift commented 3 years ago

I will close this issue.

vinven7 commented 2 years ago

@AlanSwift @SaizhuoWang @yogeshhk I am having this exact issue. I raised a new issue before I saw this one. Could you please help?