mandarjoshi90 / coref

BERT for Coreference Resolution
Apache License 2.0
441 stars 92 forks source link

evaluate.py is broken #55

Closed LifeIsStrange closed 4 years ago

LifeIsStrange commented 4 years ago

I simply want to use state of the art coreference resolution. it should take an arbitrary input text made of valid english and output the resolutions of pronouns. How can I achieve this basic usage ?

I tried both evaluate and predict

GPU=0 python evaluate.py spanbert_base give =>

... UNKNOWN ERROR (303) 2020-06-02 15:17:49.112473: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (stephane): /proc/driver/nvidia/version does not exist 2020-06-02 15:17:49.112826: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-06-02 15:17:49.143598: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2901210000 Hz 2020-06-02 15:17:49.143965: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x565419b6ea50 executing computations on platform Host. Devices: 2020-06-02 15:17:49.144000: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , Restoring from /home/stephane/b_data/spanbert_base/model.max.ckpt 2020-06-02 15:17:55.723471: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. W0602 15:17:58.632378 140016442066752 deprecation.py:323] From /home/stephane/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. Traceback (most recent call last): File "evaluate.py", line 26, in model.evaluate(session, official_stdout=True, eval_mode=True) File "/home/stephane/coref/independent.py", line 538, in evaluate self.load_eval_data() File "/home/stephane/coref/independent.py", line 532, in load_eval_data with open(self.config["eval_path"]) as f: FileNotFoundError: [Errno 2] No such file or directory: '/home/stephane/b_data/dev.english.384.jsonlines'

LifeIsStrange commented 4 years ago

@mandarjoshi90

LifeIsStrange commented 4 years ago

I fail to understand basic usage:

I want to use already fine tuned spanbert base. is evaluate.py a prompt where I can input sentences and get as an output resolved sentences (no coreference) ? or does evaluate.py is just for running the ontonote benchmark ?

Should I use instead predict.py, as a user ? Where can I find an example usage ?

LifeIsStrange commented 4 years ago

GPU=0 python predict.py "bob is a cat, he is kind." trial.jsonlines

output

Traceback (most recent call last): File "predict.py", line 12, in config = util.initialize_from_env() File "/home/stephane/coref/util.py", line 37, in initialize_from_env config = pyhocon.ConfigFactory.parse_file("experiments.conf")[name] File "/home/stephane/miniconda3/envs/py36/lib/python3.6/site-packages/pyhocon/config_tree.py", line 366, in getitem val = self.get(item) File "/home/stephane/miniconda3/envs/py36/lib/python3.6/site-packages/pyhocon/config_tree.py", line 209, in get return self._get(ConfigTree.parse_key(key), 0, default) File "/home/stephane/miniconda3/envs/py36/lib/python3.6/site-packages/pyhocon/config_tree.py", line 151, in _get raise ConfigMissingException(u"No configuration setting found for key {key}".format(key='.'.join(key_path[:key_index + 1]))) pyhocon.exceptions.ConfigMissingException: 'No configuration setting found for key bob is a cat, he is kind'

LifeIsStrange commented 4 years ago

I do not understand how to use this lib. I have no experience with BERT but it should just works, where should I put the sentences (the input text) , is that what you call ? should it be a file (edit doesnt work either) ?

LifeIsStrange commented 4 years ago

from the second most recent issue I found GPU=0 python predict.py spanbert_base input_data.jsonlines output_data.txt but I need some clarifications: should input_data.jsonlines be of the same format as

{ "clusters": [], # leave this blank "doc_key": "nw", # key closest to your domain. "nw" is newswire. See the OntoNotes documentation. "sentences": [["[CLS]", "subword1", "##subword1", ".", "[SEP]"]], # list of BERT tokenized segments. Each segment should be less than the max_segment_len in your config "speakers": [["[SPL]", "-", "-", "-", "[SPL]"]], # speaker information for each subword in sentences "sentence_map": [0, 0, 0, 0, 0], # flat list where each element is the sentence index of the subwords "subtoken_map": [0, 0, 0, 1, 1] # flat list containing original word index for each subword. [CLS] and the first word share the same index }

IF SO where do I put my basic sentences ? I don't know how to fill the keys of this unfamiliar json format

LifeIsStrange commented 4 years ago

EDIT : By looking at the second state of the art readme, I understand the usage: from https://github.com/kkjawz/coref-ee

"sentences": [["This", "is", "the", "first", "sentence", "."], ["This", "is", "the", "second", "."]], "speakers": [["spk1", "spk1", "spk1", "spk1", "spk1", "spk1"], ["spk2", "spk2", "spk2", "spk2", "spk2"]]

But contrary to your competitor you dont seems to support "sentences": [["This", "is", "the", "first", "sentence", "."], ["This", "is", "the", "second", "."]], "speakers": [["spk1", "spk1", "spk1", "spk1", "spk1", "spk1"], ["spk2", "spk2", "spk2", "spk2", "spk2"]]

but instead "sentences": [["[CLS]", "subword1", "##subword1", ".", "[SEP]"]], # list of BERT tokenized segments. Each segment should be less than the max_segment_len in your config "speakers": [["[SPL]", "-", "-", "-", "[SPL]"]], # speaker information for each subword in sentences

I can I generate valid input json from a normal list of sentences ??

Hafsa-Masroor commented 4 years ago

Hi @LifeIsStrange This notebook might help you to use this project. It pre-processes input data in the required format of jsonlines.

LifeIsStrange commented 4 years ago

I will try this next week, until then I can consider this issue as fixed, thx for the help!