Closed ChloeJKim closed 3 years ago
Hi,
you mean 'conll04_prediction_example.json'? Please rerun 'bash ./scripts/fetch_datasets.sh'. The example file is then saved to 'data/datasets/conll04/conll04_prediction_example.json'.
so i see this as prediction_example.json
[{"tokens": ["In", "1822", ",", "the", "18th", "president", "of", "the", "United", "States", ",", "Ulysses", "S.", "Grant", ",", "was", "born", "in", "Point", "Pleasant", ",", "Ohio", "."]}, ["In", "1822", ",", "the", "18th", "president", "of", "the", "United", "States", ",", "Ulysses", "S.", "Grant", ",", "was", "born", "in", "Point", "Pleasant", ",", "Ohio", "."], "In 1822, the 18th president of the United States, Ulysses S. Grant, was born in Point Pleasant, Ohio."]
do i need to include not tokenized sentence as well? (bolded one)
This is just an example of supported data formats. You have three options to specify your sentences:
Option 1 (mostly for compatibility with our CoNLL04/SciERC/ADE dataset format):
{"tokens": ["In", "1822", ",", "the", "18th", "president", "of", "the", "United", "States", ",", "Ulysses", "S.", "Grant", ",", "was", "born", "in", "Point", "Pleasant", ",", "Ohio", "."]}
Option 2 (in case your sentences are already tokenized):
["In", "1822", ",", "the", "18th", "president", "of", "the", "United", "States", ",", "Ulysses", "S.", "Grant", ",", "was", "born", "in", "Point", "Pleasant", ",", "Ohio", "."]
Option 3 (in case your sentences are not tokenized):
"In 1822, the 18th president of the United States, Ulysses S. Grant, was born in Point Pleasant, Ohio."
So in case your sentences are already tokenized, your input data would look as follows:
[["This", "is", "sentence", "1", "."], ["This", "is", "sentence", "2", "."], ["This", "is", "sentence", "3", "."], ...]
aww I see, so we can either choose one of three options and run the prediction.
so in the example_predict.conf, max_pairs = 1000 (does this refer to 1000 max sentences we can put into the model for prediction?), and can we change this number for more bigger dataset prediction?
so in the example_predict.conf, max_pairs = 1000 (does this refer to 1000 max sentences we can put into the model for prediction?), and can we change this number for more bigger dataset prediction?
This option is a bit misleading. It just restricts the number of entity pairs in a sentence that are processed at once to lower memory consumption. In case you do not run into any memory (cpu or gpu memory) problems, just leave it at 1000. The code always processes your whole dataset.
I see, thanks for clarifying :)
On Thu, Feb 4, 2021 at 11:35 AM Markus Eberts notifications@github.com wrote:
so in the example_predict.conf, max_pairs = 1000 (does this refer to 1000 max sentences we can put into the model for prediction?), and can we change this number for more bigger dataset prediction?
This option is a bit misleading. It just restricts the number of entity pairs in a sentence that are processed at once to lower memory consumption. In case you do not run into any memory (cpu or gpu memory) problems, just leave it at 1000. The code always processes your whole dataset.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/markus-eberts/spert/issues/40#issuecomment-773553983, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMQJDVNQUADSVQIX5UNHU23S5LZHDANCNFSM4XDKYEEQ .
-- *Chloe Kim* | Masters Student UC Berkeley Masters in Bioengineering chloe.kim@berkeley.edu
Hi @markus-eberts
Thanks for making a prediction mode.
I was just wondering where I can find the conll04_predictions.json file?
Thanks! Chloe