willferreira / mscproject

26 stars 9 forks source link

Stanford CoreNLP #7

Open federicoruggeri opened 6 years ago

federicoruggeri commented 6 years ago

Dear Mr. Ferreira, I kindly ask you if it's possible to know the Stanford CoreNLP version that was used in order to parse sentences. I'm currently using version "2014-08-27", but parsed dependencies are missing the "-stanford_idx" number.

Example taken from Untitle1.ipynb: nlp.parse("She didn't see the elephant")

Expected output:

{u'sentences': [{u'dependencies': [[u'root', u'ROOT-0', u'see-4'],
    [u'nsubj', u'see-4', u'She-1'],
    [u'aux', u'see-4', u'did-2'],
    [u'neg', u'see-4', u"n't-3"],
    [u'det', u'elephant-6', u'the-5'],
    [u'dobj', u'see-4', u'elephant-6']],
   u'parsetree': u"(ROOT (S (NP (PRP She)) (VP (VBD did) (RB n't) (VP (VB see) (NP (DT the) (NN elephant))))))",
   u'text': u"She didn't see the elephant",
   u'words': [[u'She',
     {u'CharacterOffsetBegin': u'0',
      u'CharacterOffsetEnd': u'3',
      u'Lemma': u'she',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'PRP'}],
    [u'did',
     {u'CharacterOffsetBegin': u'4',
      u'CharacterOffsetEnd': u'7',
      u'Lemma': u'do',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'VBD'}],
    [u"n't",
     {u'CharacterOffsetBegin': u'7',
      u'CharacterOffsetEnd': u'10',
      u'Lemma': u'not',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'RB'}],
    [u'see',
     {u'CharacterOffsetBegin': u'11',
      u'CharacterOffsetEnd': u'14',
      u'Lemma': u'see',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'VB'}],
    [u'the',
     {u'CharacterOffsetBegin': u'15',
      u'CharacterOffsetEnd': u'18',
      u'Lemma': u'the',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'DT'}],
    [u'elephant',
     {u'CharacterOffsetBegin': u'19',
      u'CharacterOffsetEnd': u'27',
      u'Lemma': u'elephant',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'NN'}]]}]}

Stanford CoreNLP 2014-08-27 (used) output:

{u'sentences': [{u'dependencies': [[u'root', u'ROOT', u'see'],
                                   [u'nsubj', u'see', u'She'],
                                   [u'aux', u'see', u'did'],
                                   [u'neg', u'see', u"n't"],
                                   [u'det', u'elephant', u'the'],
                                   [u'dobj', u'see', u'elephant']],
                 u'parsetree': u"(ROOT (S (NP (PRP She)) (VP (VBD did) (RB n't) (VP (VB see) (NP (DT the) (NN elephant))))))",
                 u'text': u"She didn't see the elephant",
                 u'words': [[u'She',
                             {u'CharacterOffsetBegin': u'0',
                              u'CharacterOffsetEnd': u'3',
                              u'Lemma': u'she',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'PRP'}],
                            [u'did',
                             {u'CharacterOffsetBegin': u'4',
                              u'CharacterOffsetEnd': u'7',
                              u'Lemma': u'do',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'VBD'}],
                            [u"n't",
                             {u'CharacterOffsetBegin': u'7',
                              u'CharacterOffsetEnd': u'10',
                              u'Lemma': u'not',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'RB'}],
                            [u'see',
                             {u'CharacterOffsetBegin': u'11',
                              u'CharacterOffsetEnd': u'14',
                              u'Lemma': u'see',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'VB'}],
                            [u'the',
                             {u'CharacterOffsetBegin': u'15',
                              u'CharacterOffsetEnd': u'18',
                              u'Lemma': u'the',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'DT'}],
                            [u'elephant',
                             {u'CharacterOffsetBegin': u'19',
                              u'CharacterOffsetEnd': u'27',
                              u'Lemma': u'elephant',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'NN'}]]}]}

Is it just a version issue or is it something else?

Best Wishes, Federico Ruggeri

willferreira commented 6 years ago

Hello

I will check tonight, but it was a while ago now and I may have uninstalled the Stanford parser from my mac. What exactly is the problem?

Will

On 12 Oct 2017 2:12 pm, "federicoruggeri" notifications@github.com wrote:

Dear Mr. Ferreira, I kindly ask you if it's possible to know the Stanford CoreNLP version that was used in order to parse sentences. I'm currently using version "2014-08-27", but parsed dependencies are missing the "-stanford_idx" number.

Example taken from Untitle1.ipynb: nlp.parse("She didn't see the elephant")

Expected output:

{u'sentences': [{u'dependencies': [[u'root', u'ROOT-0', u'see-4'], [u'nsubj', u'see-4', u'She-1'], [u'aux', u'see-4', u'did-2'], [u'neg', u'see-4', u"n't-3"], [u'det', u'elephant-6', u'the-5'], [u'dobj', u'see-4', u'elephant-6']], u'parsetree': u"(ROOT (S (NP (PRP She)) (VP (VBD did) (RB n't) (VP (VB see) (NP (DT the) (NN elephant))))))", u'text': u"She didn't see the elephant", u'words': [[u'She', {u'CharacterOffsetBegin': u'0', u'CharacterOffsetEnd': u'3', u'Lemma': u'she', u'NamedEntityTag': u'O', u'PartOfSpeech': u'PRP'}], [u'did', {u'CharacterOffsetBegin': u'4', u'CharacterOffsetEnd': u'7', u'Lemma': u'do', u'NamedEntityTag': u'O', u'PartOfSpeech': u'VBD'}], [u"n't", {u'CharacterOffsetBegin': u'7', u'CharacterOffsetEnd': u'10', u'Lemma': u'not', u'NamedEntityTag': u'O', u'PartOfSpeech': u'RB'}], [u'see', {u'CharacterOffsetBegin': u'11', u'CharacterOffsetEnd': u'14', u'Lemma': u'see', u'NamedEntityTag': u'O', u'PartOfSpeech': u'VB'}], [u'the', {u'CharacterOffsetBegin': u'15', u'CharacterOffsetEnd': u'18', u'Lemma': u'the', u'NamedEntityTag': u'O', u'PartOfSpeech': u'DT'}], [u'elephant', {u'CharacterOffsetBegin': u'19', u'CharacterOffsetEnd': u'27', u'Lemma': u'elephant', u'NamedEntityTag': u'O', u'PartOfSpeech': u'NN'}]]}]}

Stanford CoreNLP 2014-08-27 (used) output:

{u'sentences': [{u'dependencies': [[u'root', u'ROOT', u'see'], [u'nsubj', u'see', u'She'], [u'aux', u'see', u'did'], [u'neg', u'see', u"n't"], [u'det', u'elephant', u'the'], [u'dobj', u'see', u'elephant']], u'parsetree': u"(ROOT (S (NP (PRP She)) (VP (VBD did) (RB n't) (VP (VB see) (NP (DT the) (NN elephant))))))", u'text': u"She didn't see the elephant", u'words': [[u'She', {u'CharacterOffsetBegin': u'0', u'CharacterOffsetEnd': u'3', u'Lemma': u'she', u'NamedEntityTag': u'O', u'PartOfSpeech': u'PRP'}], [u'did', {u'CharacterOffsetBegin': u'4', u'CharacterOffsetEnd': u'7', u'Lemma': u'do', u'NamedEntityTag': u'O', u'PartOfSpeech': u'VBD'}], [u"n't", {u'CharacterOffsetBegin': u'7', u'CharacterOffsetEnd': u'10', u'Lemma': u'not', u'NamedEntityTag': u'O', u'PartOfSpeech': u'RB'}], [u'see', {u'CharacterOffsetBegin': u'11', u'CharacterOffsetEnd': u'14', u'Lemma': u'see', u'NamedEntityTag': u'O', u'PartOfSpeech': u'VB'}], [u'the', {u'CharacterOffsetBegin': u'15', u'CharacterOffsetEnd': u'18', u'Lemma': u'the', u'NamedEntityTag': u'O', u'PartOfSpeech': u'DT'}], [u'elephant', {u'CharacterOffsetBegin': u'19', u'CharacterOffsetEnd': u'27', u'Lemma': u'elephant', u'NamedEntityTag': u'O', u'PartOfSpeech': u'NN'}]]}]}

Is it just a version issue or is it something else?

Best Wishes, Federico Ruggeri

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/willferreira/mscproject/issues/7, or mute the thread https://github.com/notifications/unsubscribe-auth/AFid8nonGcrYfe4rTm_oo6-dhP-glWvjks5srhA6gaJpZM4P2-93 .

federicoruggeri commented 6 years ago

Dear Mr. Ferreira, The problem concerns the 'dependencies' list. For example: [u'root', u'ROOT-0', u'see-4'] differs from [u'root', u'ROOT', u'see'] (my output)

As a result, the construction of the file 'stanparse-depths.pickle' fails. More precisely, in "src/model/utils.py" the method get_stanford_idx(x) fails since the number it is looking for is missing (in the example above: '0' of 'ROOT-0'). This problem can be verified by running the _run_calc_stan_parsedepths.py script inside "bin" folder. Since the repository is missing the Stanford CoreNLP python wrapper, I don't know exactly what implementation was used. I'm currently using the following one: https://github.com/dasmith/stanford-corenlp-python

I kindly thank you for your time, Best Wishes, Federico Ruggeri

willferreira commented 6 years ago

I will take a look and get back to you ASAP, however over 3 years have passed since I looked at this and so Iay not be able to help. What is your interest in the work?

Will

On 12 Oct 2017 2:39 pm, "federicoruggeri" notifications@github.com wrote:

Dear Mr. Ferreira, The problem concerns the 'dependencies' list. For example: [u'root', u'ROOT-0', u'see-4'] differs from [u'root', u'ROOT', u'see'] (my output)

As a result, the construction of the file 'stanparse-depths.pickle' fails. More precisely, in "src/model/utils.py" the method get_stanford_idx(x) fails since the number it is looking for is missing (in the example above: '0' of 'ROOT-0'). This problem can be verified by running the run_calc_stan_parse_depths.py script inside "bin" folder. Since the repository is missing the Stanford CoreNLP python wrapper, I don't know exactly what implementation was used. I'm currently using the following one: https://github.com/dasmith/stanford-corenlp-python

I kindly thank you for your time, Best Wishes, Federico Ruggeri

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/willferreira/mscproject/issues/7#issuecomment-336140177, or mute the thread https://github.com/notifications/unsubscribe-auth/AFid8tzkMHwpWNtajU8BBhWAzYMQA53Pks5srhaqgaJpZM4P2-93 .

federicoruggeri commented 6 years ago

Dear Mr. Ferreira, I'm a student of the university of Bologna (UNIBO). I'm currently studying stance classification for my master degree thesis under the guidance of professor Torroni and researcher Marco Lippi. More precisely, my aim regards argument structure prediction by exploiting stance classification techniques. As a matter of fact, I was curious to experiment known classifiers, such as the one used for Emergent, with other datasets in the same research field.

Yours Sincerely, Federico Ruggeri

willferreira commented 6 years ago

I think you might find better results than mine now. Take a look at the fake news challenge.

On 12 Oct 2017 3:05 pm, "federicoruggeri" notifications@github.com wrote:

Dear Mr. Ferreira, I'm a student of the university of Bologna (UNIBO). I'm currently studying stance classification for my master degree thesis under the guidance of professor Torroni and researcher Marco Lippi. More precisely, my aim regards argument structure prediction by exploiting stance classification techniques. As a matter of fact, I was curious to experiment known classifiers, such as the one used for Emergent, with other datasets in the same research field.

Yours Sincerely, Federico Ruggeri

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/willferreira/mscproject/issues/7#issuecomment-336147551, or mute the thread https://github.com/notifications/unsubscribe-auth/AFid8g2TKIFBL_8Rpf9BJjoOK0-nIvqZks5srhyMgaJpZM4P2-93 .

andreasvlachos commented 6 years ago

Hi Federico,

Yes, if your goal is to run models like ours on other datasets, you will be best served by more recent code. Take a look at the following repos for example: https://github.com/j6mes/fnc-ensemble https://github.com/uclmr/fakenewschallenge

Best wishes, Andreas

On Thu, 12 Oct 2017 at 15:08 William Ferreira notifications@github.com wrote:

I think you might find better results than mine now. Take a look at the fake news challenge.

On 12 Oct 2017 3:05 pm, "federicoruggeri" notifications@github.com wrote:

Dear Mr. Ferreira, I'm a student of the university of Bologna (UNIBO). I'm currently studying stance classification for my master degree thesis under the guidance of professor Torroni and researcher Marco Lippi. More precisely, my aim regards argument structure prediction by exploiting stance classification techniques. As a matter of fact, I was curious to experiment known classifiers, such as the one used for Emergent, with other datasets in the same research field.

Yours Sincerely, Federico Ruggeri

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/willferreira/mscproject/issues/7#issuecomment-336147551 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AFid8g2TKIFBL_8Rpf9BJjoOK0-nIvqZks5srhyMgaJpZM4P2-93

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/willferreira/mscproject/issues/7#issuecomment-336148551, or mute the thread https://github.com/notifications/unsubscribe-auth/ABbUhRTHDXleCixKbQz62XvBlHRB1Qpjks5srh1cgaJpZM4P2-93 .

federicoruggeri commented 6 years ago

Dear Mr. Ferreira and Mr. Vlachos, My aim regards predicting evidence stance towards a given claim. Since the Emergent dataset contains claims with respect to articles, my idea was to use the related classifier with another dataset, i.e. CE-ACL-14 (IBM), which reports claims and evidences extracted from articles. This is just an experiment since from what I know there are no corpora that couple evidences and claims, considering both opposing and supporting links between them. For this reason, a first attempt was to use the Emergent classifier with the CE-ACL-14 dataset and analyse the results. I kindly thank you for your time and for the content given (I was surprised to receive an answer in such short time). I don't want you to waste a long amount of time if it is required to solve my issue. Trying the Emergent classifier was just my first idea.

Yours Sincerely, Federico Ruggeri

sapieneptus commented 5 years ago

@federicoruggeri If this is still relevant, can adapt the algorithm to the latest Stanford core-nlp version (2018-10-05) like so:

for dependency in sentence['basicDependencies']:
  relationship = dependency['dep']
  head_idx = int(dependency['governor']) - 1                   
  head = dependency['governorGloss']
  dependent_idx = int(dependency['dependent']) - 1
  dependent = dependency['dependentGloss']

There is no more dependencies, instead we have our choice of basicDependencies, enhancedDependencies, and enhancedPlusPlusDependencies. The fields are all already parsed so we don't need to do string manipulation to get the components.