Closed EricLe-dev closed 2 years ago
Hey, there I think I can help you with some of your questions. Or at least I can tell you what I did.
use File::Basename;
use File::Spec;
use lib File::Spec->catdir(File::Basename::dirname(File::Spec->rel2abs($0)),'scorer', 'lib');
I hope this helps.I also found, that using the model after training comes with some further work, as the model loaded from the neuralcoref cache folder is in a different format. I'm currently investigating how to get this to work. And you should also keep in mind, that your results are really reliant on your mention extraction (as noted in the training instructions).
@chieter Thank you so much for sharing the details of your work. I'm following your advice and it fixed the problem regarding PERL. I am also adding blank vectors to MISSING and UNK tokens.
You are right on the fact that using the model after training requires further work as I already saw people posting that question to StackOverflow and seem not to get the proper answer. I will share here what I find when I'm at that stage.
To get your work started here is what I've been doing:
from neuralcoref.train.model import Model
model = Model(len(voc), SIZE_EMBEDDING, Args.h1, Args.h2, Args.h3, SIZE_PAIR_IN, SIZE_SINGLE_IN)
Adjust the parameters according to the ones you used during traininig.model.load_state_dict(torch.load(path_to_checkpoint_file))
You can find this in train/learn.py.model.eval()
6.You can now use the PyTorchWrapper from the thinc alpha.
from thinc.api import PyTorchWrapper
pair_top_model = PyTorchWrapper(model.pair_top)
single_top_model = PyTorchWrapper(model.single_top)
f = open('trained_pair_top_model', 'wb')
f.write(pair_top_model.to_bytes())
f.close()
You can save the single_top model in the same manner.I hope this is helpful to you @EricLe-dev . There are some unsolved problems however. The model loaded from cache has two folders included static_vectors
and tuned_vectors
. I'm not yet sure if these can be extracted from the model loaded in step 3. Maybe @thomwolf or @svlandeg could elaborate on that. It would also be useful to know if this is even the right approach. Take care everyone and thanks for your work <3
@chieter Thank you so much for your detailed works. I think the in the point 2 you mentioned that you initialized the model with:
model = Model(len(voc), SIZE_EMBEDDING, Args.h1, Args.h2, Args.h3, SIZE_PAIR_IN, SIZE_SINGLE_IN)
I think that makes sense because thanks to the fix that you proposed, I was able to get the PERL scorer working, however, I got this error:
I guess the model initialized was not properly align with my input data, here are my configuration:
args.h1 = 1000
args.h2 = 500
args.h3 = 500
SIZE_EMBEDDING = 320
Other parameters of SIZE
were used as default in utils.py. Am I doing anything wrong here? I guess I miss-calculated those values. How did you calculate the SIZE_PAIR_IN and the SIZE_SINGLE_IN?. In addition, I wrote an email to @thomwolf and luckily got his reply, as he stated, @svlandeg is the main maintainer of this project.
To get your work started here is what I've been doing:
- Install the alpha version of thinc, as this version includes the PyTorchWrapper
- Create an Instance of the model in a python shell. I basically used the same instantiation that is used in train/dataset.py:
from neuralcoref.train.model import Model
model = Model(len(voc), SIZE_EMBEDDING, Args.h1, Args.h2, Args.h3, SIZE_PAIR_IN, SIZE_SINGLE_IN)
Adjust the parameters according to the ones you used during traininig.- Load the model checkpoint you want to use:
model.load_state_dict(torch.load(path_to_checkpoint_file))
You can find this in train/learn.py.- If everything worked so far you should see a message, that all keys could be matched and if you call the model, you should see different parts of the net labeled (word_embeds), (drop), (pair_top) and (single_top).
- As far as I could tell the (pair_top) and (single_top) parts are used in the model that is loaded from neuralcoref cache. I think to avoid exporting the dropout layers used in training you need to change the model to evaluation mode using
model.eval()
6.You can now use the PyTorchWrapper from the thinc alpha.from thinc.api import PyTorchWrapper
pair_top_model = PyTorchWrapper(model.pair_top)
single_top_model = PyTorchWrapper(model.single_top)
- These models can now be written to disk using:
f = open('trained_pair_top_model', 'wb')
f.write(pair_top_model.to_bytes())
f.close()
You can save the single_top model in the same manner.I hope this is helpful to you @EricLe-dev . There are some unsolved problems however. The model loaded from cache has two folders included
static_vectors
andtuned_vectors
. I'm not yet sure if these can be extracted from the model loaded in step 3. Maybe @thomwolf or @svlandeg could elaborate on that. It would also be useful to know if this is even the right approach. Take care everyone and thanks for your work <3
The parameters SIZE_PAIR_IN
and SIZE_SINGLE_IN
are calculated according to the formulas in utils.py. These are dependent on SIZE_EMBEDDING
as well, so if you changed that you have to calculate them according to your embedding size.
My SIZE_EMBEDDING
was 320 so I modified it. For the other parameters like SIZE_SPAN
, SIZE_WORD
, SIZE_FP
, etc., I just use the default values, should they be modified accordingly?
You can leave them as they are, but you need all parameters that you need for model instantiation to have the values used during training.
According to the picture of the size mismatch error, I could see that my input size data was:
[860x4184]
while the model expects something [674x1000]
. After some debugs, I realize that the first number (860) was the SIZE_SINGLE_IN
, for this reason, I guess the second number (4184) was the SIZE_PAIR_IN
so I just directly initialized my model like this:
model = Model(len(voc), SIZE_EMBEDDING, Args.h1, Args.h2, Args.h3, 860, 4814)
I'm running the model again to see if it work.
@chieter Thank you so much for your detailed works. I think the in the point 2 you mentioned that you initialized the model with:
model = Model(len(voc), SIZE_EMBEDDING, Args.h1, Args.h2, Args.h3, SIZE_PAIR_IN, SIZE_SINGLE_IN)
I think that makes sense because thanks to the fix that you proposed, I was able to get the PERL scorer working, however, I got this error:I guess the model initialized was not properly align with my input data, here are my configuration:
args.h1 = 1000
args.h2 = 500
args.h3 = 500
SIZE_EMBEDDING = 320
Other parameters of
SIZE
were used as default in utils.py. Am I doing anything wrong here? I guess I miss-calculated those values. How did you calculate the SIZE_PAIR_IN and the SIZE_SINGLE_IN?. In addition, I wrote an email to @thomwolf and luckily got his reply, as he stated, @svlandeg is the main maintainer of this project.To get your work started here is what I've been doing:
- Install the alpha version of thinc, as this version includes the PyTorchWrapper
- Create an Instance of the model in a python shell. I basically used the same instantiation that is used in train/dataset.py:
from neuralcoref.train.model import Model
model = Model(len(voc), SIZE_EMBEDDING, Args.h1, Args.h2, Args.h3, SIZE_PAIR_IN, SIZE_SINGLE_IN)
Adjust the parameters according to the ones you used during traininig.- Load the model checkpoint you want to use:
model.load_state_dict(torch.load(path_to_checkpoint_file))
You can find this in train/learn.py.- If everything worked so far you should see a message, that all keys could be matched and if you call the model, you should see different parts of the net labeled (word_embeds), (drop), (pair_top) and (single_top).
- As far as I could tell the (pair_top) and (single_top) parts are used in the model that is loaded from neuralcoref cache. I think to avoid exporting the dropout layers used in training you need to change the model to evaluation mode using
model.eval()
6.You can now use the PyTorchWrapper from the thinc alpha.from thinc.api import PyTorchWrapper
pair_top_model = PyTorchWrapper(model.pair_top)
single_top_model = PyTorchWrapper(model.single_top)
- These models can now be written to disk using:
f = open('trained_pair_top_model', 'wb')
f.write(pair_top_model.to_bytes())
f.close()
You can save the single_top model in the same manner.I hope this is helpful to you @EricLe-dev . There are some unsolved problems however. The model loaded from cache has two folders included
static_vectors
andtuned_vectors
. I'm not yet sure if these can be extracted from the model loaded in step 3. Maybe @thomwolf or @svlandeg could elaborate on that. It would also be useful to know if this is even the right approach. Take care everyone and thanks for your work <3
@chieter it does not work, I guess I still wrongly calculated somewhere else:
This is my configuration, as you can see, I modified the SIZE_EMBEDDING = 320
, for the rest, I left them as default. With that configuration, the values for SIZE_PAIR_IN
will be equal to 5690 and SIZE_SINGLE_IN
will be equal to 2834.
My vocab length is 626711
But still, I got this exception:
Can you please help me to point out at which point did I do wrong? Thank you so much!
During which step did you get this error? I'm asking because my error message looks a bit different if the sizes of the imported weights and instantiated model do not match up.
Yes, I got it during the training step, I do python learn.py --train ./data
, it runs for a few minutes then I emit that exception. Here is the full stacktrace (including some debugging print that I put in manually):
..\torch\csrc\utils\tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. SCORING_SCRIPT: c:\users\administrator\desktop\neural_coref\neuralcoref\neuralcoref\train\scorer_wrapper.pl Namespace(all_pairs_epoch=200, all_pairs_l2=1e-06, all_pairs_lr=0.0002, batchsize=10000, checkpoint_file=None, conll_eval_interval=10, conll_train_interval=20, costfl=0.4, costfn=0.8, costs={'FN': 0.8, 'FL': 0.4, 'WL': 1.0}, costwl=1.0, cuda=False, eval='C:\\Users\\Administrator\\Desktop\\neural_coref\\neuralcoref\\neuralcoref\\train/data//numpy/', evalkey='C:\\Users\\Administrator\\Desktop\\neural_coref\\neuralcoref\\neuralcoref\\train/data//key.txt', h1=1000, h2=500, h3=500, lazy=True, log_interval=10, min_lr=2e-08, numworkers=0, on_eval_decrease='nothing', patience=3, ranking_epoch=200, ranking_l2=1e-05, ranking_lr=2e-06, save_path='C:\\Users\\Administrator\\Desktop\\neural_coref\\neuralcoref\\neuralcoref\\train\\checkpoints\\Jun02_17-43-36_vm05_', seed=1111, startstage=None, startstep=None, top_pairs_epoch=200, top_pairs_l2=1e-05, top_pairs_lr=0.0002, train='./data/numpy/', trainkey='./data/key.txt', weights=None) Training for 200 200 200 epochs ./data/numpy/ loading ./data/numpy/tuned_word_embeddings.npy torch.Size([626711, 320]) loading ./data/numpy/tuned_word_vocabulary.txt Loading Dataset at ./data/numpy/ Reading mentions_features.npy, mentions_labels.npy, mentions_pairs_length.npy, mentions_pairs_start_index.npy, mentions_spans.npy, mentions_words.npy, pairs_ant_index.npy, pairs_features.npy, pairs_labels.npy, static_word_embeddings.npy, tuned_word_embeddings.npy, Loading Dataset at C:\Users\Administrator\Desktop\neural_coref\neuralcoref\neuralcoref\train/data//numpy/ Reading mentions_features.npy, mentions_labels.npy, mentions_pairs_length.npy, mentions_pairs_start_index.npy, mentions_spans.npy, mentions_words.npy, pairs_ant_index.npy, pairs_features.npy, pairs_labels.npy, static_word_embeddings.npy, tuned_word_embeddings.npy, Vocabulary LENGTH HERE: 626711 Build model SIZE_EMBEDDING: 320 SIZE_PAIR_IN: 5690 SIZE_SINGLE_IN: 2834 Loading conll evaluator Preparing batches Dataset has: 14205 batches, 289061 mentions, 150790638 pairs Reading conll_tokens.bin, doc.bin, locations.bin, spacy_lookup.bin, Done Preparing batches Dataset has: 14205 batches, 289061 mentions, 150790638 pairs Reading conll_tokens.bin, doc.bin, locations.bin, spacy_lookup.bin, Done Testing evaluator and getting first eval score Test evaluator / print all mentions Building test file Construct test file Writing in c:\users\administrator\desktop\neural_coref\neuralcoref\neuralcoref\train\test_mentions.txt Computing score Mention identification recall 0 <= Detected mentions 0.0 True mentions 0.0 Scores {'muc': (0, 0, 0), 'bcub': (0, 0, 0), 'ceafe': (0, 0, 0)} F1_conll 0.0 Building test file Build coreference clusters LENGTH INPUT FORWARD: 3 SINGLE INPUT SHAPE: torch.Size([860, 4184]) Traceback (most recent call last): File "learn.py", line 572, in <module> run_model(args) File "learn.py", line 181, in run_model eval_evaluator.build_test_file() File "c:\users\administrator\desktop\neural_coref\neuralcoref\neuralcoref\train\evaluator.py", line 200, in build_test_file scores, max_i = self.get_max_score(sample_batched) File "c:\users\administrator\desktop\neural_coref\neuralcoref\neuralcoref\train\evaluator.py", line 169, in get_max_score scores = self.model(inputs, concat_axis=1) File "C:\Users\Administrator\Desktop\neural_coref\venv\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "c:\users\administrator\desktop\neural_coref\neuralcoref\neuralcoref\train\model.py", line 104, in forward single_scores = self.single_top(single_input) File "C:\Users\Administrator\Desktop\neural_coref\venv\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Administrator\Desktop\neural_coref\venv\lib\site-packages\torch\nn\modules\container.py", line 100, in forward input = module(input) File "C:\Users\Administrator\Desktop\neural_coref\venv\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "C:\Users\Administrator\Desktop\neural_coref\venv\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "C:\Users\Administrator\Desktop\neural_coref\venv\lib\site-packages\torch\nn\functional.py", line 1610, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: size mismatch, m1: [860 x 4184], m2: [2834 x 1000] at C:\w\b\windows\pytorch\aten\src\TH/generic/THTensorMath.cpp:41
Ok, maybe I should have made this clearer, all my instructions were meant to be executed after training, when you already have a checkpoint file from training.
For training my model the only values I changed were SIZE_EMBEDDING
and SIZE_SPAN
in utils.py. According to this post the span vectors are 5 vectors, that average embeddings and so they should be 5 times the size of your embeddings. You maybe also need to change this assertion dataset.py.
@chieter thanks to your information, I have been able to run the training for my model. Just that it seems to be taking extremely long to finish 1 Epoch.
Also, at this line I can see that the add_to_pipe
function makes a call to load a file name vocab.txt
, I guess it is our static_word_vocabulary.txt
.
In addition, this line specifies the path to load the model, I guess we can manually modify it, then do a pip install -e .
to build NeuralCoref from source.
@chieter Hi mate,
Have you ever encountered this error during training?
I reduced the number of workers to 3, with smaller batch size but still I got this error. I also activated the --lazy
flag during the training.
Here is my environment:
Windows 10
Python 3.8
RAM 64GB
GPU 8GB
Thank you so much!
I fixed it by turning off the pin_memory
@chieter can you please tell me how to construct the file key2row
in tuned_vectors
?
Thank you so much <3
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Dear guys,
Firstly, thank you guys so much for this interesting work. I'm training the neuralcoref model for Dutch language using SoNar corpus, at first, I used this script to convert the MMAX format to CONLL format. After that, I trained a w2v model to prepare the static_word_embedding files. I have a few questions that I could not answer myself and I could not also find anywhere else.
I came across many topics as well as posting questions on many threads, however I still got no help or guidance. Thank you so much for any help that any of you can provide.
With best regards, Eric