Closed pratikchhapolika closed 2 years ago
Models trained on Few-NERD cannot directly make prediction on custom dataset.
In the spirit of few-shot learning, you should also provide some training data as well (e.g. a few labeled instances of type dispute_amount
, and dispute_date
) as you are requiring prediction for new entity types. You may structure your test data as test_5_5.json
and replace it (as if you have 5 labeled instances for each entity type) and running the pipeline will give you results on your dataset.
To be more specific, when modifying test_5_5.json
, put your labeled data in support
and test data for prediction in query
.
Models trained on Few-NERD cannot directly make prediction on custom dataset. In the spirit of few-shot learning, you should also provide some training data as well (e.g. a few labeled instances of type
dispute_amount
, anddispute_date
) as you are requiring prediction for new entity types. You may structure your test data astest_5_5.json
and replace it (as if you have 5 labeled instances for each entity type) and running the pipeline will give you results on your dataset.
@cyl628 , I get your point, but how to get the output tags rather than just precision, recall and accuracy metrics.??
To be more specific, when modifying
test_5_5.json
, put your labeled data insupport
and test data for prediction inquery
.
I have created a test data like 5_5 test
. But for each example how do I get the output?
In the test data
, we have support and query
as well... so what is the purpose of keeping both support and query
?
So the actual test data is the Query
right?
To be more specific, when modifying
test_5_5.json
, put your labeled data insupport
and test data for prediction inquery
.I have created a test data like
5_5 test
. But for each example how do I get the output?In the
test data
, we havesupport and query
as well... so what is the purpose of keeping bothsupport and query
? So the actual test data is theQuery
right?
Adding to it:
But in real scenario, we get only one instance of test-data, say sentence like I want to return this damaged product.
Then in this case how should I pass this to trained model?
To be more specific, when modifying
test_5_5.json
, put your labeled data insupport
and test data for prediction inquery
.
Both support and Query in test_5_5.json
has labels. right? So I have to keep labels as well?
Models trained on Few-NERD cannot directly make prediction on custom dataset. In the spirit of few-shot learning, you should also provide some training data as well (e.g. a few labeled instances of type
dispute_amount
, anddispute_date
) as you are requiring prediction for new entity types. You may structure your test data astest_5_5.json
and replace it (as if you have 5 labeled instances for each entity type) and running the pipeline will give you results on your dataset.@cyl628 , I get your point, but how to get the output tags rather than just precision, recall and accuracy metrics.??
there is a "label2tag" key in each query dictionary from the dataloader. the "label2tag" dictionary maps index labels to tags. https://github.com/thunlp/Few-NERD/blob/d6d5e5182cdd89954515719bd190b44848c1a03f/util/data_loader.py#L294
also in eval
method there is pred (the predicted index label)
https://github.com/thunlp/Few-NERD/blob/d6d5e5182cdd89954515719bd190b44848c1a03f/util/framework.py#L536
you may need to rewrite some of the eval code to print the tag output
yes. The actual test data is the query. You have to provide support because this is a few-shot learning model, instead of zero-shot. So the model has to learn from some labeled examples.
To be more specific, when modifying
test_5_5.json
, put your labeled data insupport
and test data for prediction inquery
.I have created a test data like
5_5 test
. But for each example how do I get the output? In thetest data
, we havesupport and query
as well... so what is the purpose of keeping bothsupport and query
? So the actual test data is theQuery
right?Adding to it:
But in real scenario, we get only one instance of test-data, say sentence like
I want to return this damaged product.
Then in this case how should I pass this to trained model?
It doesn't matter. You can just put one sentence in the query.
To be more specific, when modifying
test_5_5.json
, put your labeled data insupport
and test data for prediction inquery
.
I have trained the model using train_5_1.jsonl
.
To just run the test on test_5_1.jsonl
on saved model checkpoint/proto-inter-5-1-seed0.pth.tar
is this the quey I should fire:
!python3 train_demo.py --mode inter --lr 1e-4 --batch_size 4 --only_test --load_ckpt 'checkpoint/proto-inter-5-1-seed0.pth.tar' --test_iter 5000 --max_length 64 --model proto --tau 0.32
To be more specific, when modifying
test_5_5.json
, put your labeled data insupport
and test data for prediction inquery
.I have trained the model using
train_5_1.jsonl
. To just run the test ontest_5_1.jsonl
on saved modelcheckpoint/proto-inter-5-1-seed0.pth.tar
is this the quey I should fire:
!python3 train_demo.py --mode inter --lr 1e-4 --batch_size 4 --only_test --load_ckpt 'checkpoint/proto-inter-5-1-seed0.pth.tar' --test_iter 5000 --max_length 64 --model proto --tau 0.32
The above command doesn't seems to work as its taking lot of time to do inferencing in just 2 samples I provided in test_5_1.jsonl
To be more specific, when modifying
test_5_5.json
, put your labeled data insupport
and test data for prediction inquery
.Both support and Query in
test_5_5.json
has labels. right? So I have to keep labels as well?
To run the current pipeline, yes, because metrics need to be calculated. Of course, you can read through the evaluation code and modify it. That way you can get the predicted tags (or labels) without the metrics, so there will be no need to provide labels for query.
To be more specific, when modifying
test_5_5.json
, put your labeled data insupport
and test data for prediction inquery
.I have trained the model using
train_5_1.jsonl
. To just run the test ontest_5_1.jsonl
on saved modelcheckpoint/proto-inter-5-1-seed0.pth.tar
is this the quey I should fire:!python3 train_demo.py --mode inter --lr 1e-4 --batch_size 4 --only_test --load_ckpt 'checkpoint/proto-inter-5-1-seed0.pth.tar' --test_iter 5000 --max_length 64 --model proto --tau 0.32
The above command doesn't seems to work as its taking lot of time to do inferencing in just 2 samples I provided in
test_5_1.jsonl
The command looks good to me. Currrently I don't have time to check what is wrong here. As you are running your custom dataset, I suggest you inspecting the code and see where it gets stuck. I believe reading through evaluation code will not be a big burden 😄
How do I make use of this
FewNERD
model to do inferencing on my data-set after training on FewNERD data. The idea is to see how it performs on my custom -data?Step1: I train the
proto
model using FEWNerd data as mentioned:python3 train_demo.py --mode inter --lr 1e-4 --batch_size 8 --trainN 5 --N 5 --K 1 --Q 1 --train_iter 10000 --val_iter 500 --test_iter 5000 --val_step 1000 --max_length 64 --model proto --tau 0.32
Once training is complete:I have data-set which is in this format and has entities as
['dispute_amount', 'dispute_date', 'competitors]
My test-data is in this format?
How can I test and print outputs on my test data?
@cyl628 @ningding97