stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.61k stars 2.7k forks source link

Unable to extract triplets #1455

Open DeepankarVyas opened 1 month ago

DeepankarVyas commented 1 month ago

I am working on extracting triplets from an annotated string, but the code's returning NULL. Here is the code used:-

library(tidyverse)
library(tm)
library(coreNLP)

# Increase Java heap space
options(java.parameters = "-Xmx4g")

# Initialize CoreNLP with the path to the unzipped folder
initCoreNLP("/Users/..../stanford-corenlp-4.5.7/")  

# Function to extract relations using CoreNLP
extract_relations <- function(text) {
  cat("Text to be annotated:\n", text, "\n\n")

  annotation <- tryCatch({
    annotateString(text)
  }, error = function(e) {
    message("Error in annotation: ", e)
    return(NULL)
  })

  if (is.null(annotation)) {
    message("Annotation is NULL")
    return(list())
  }

  print(annotation)

  triples <- tryCatch({
    getOpenIE(annotation)
  }, error = function(e) {
    message("Error in extracting OpenIE triples: ", e)
    return(NULL)
  })

  if (is.null(triples) || length(triples) == 0) {
    message("No triples extracted.")
    return(list())
  }

  print(triples)

}

# Mock dataset to train the model
mock_data <- data.frame(
  match_id = 1,
  home_team = "Manchester United",
  away_team = "Chelsea",
  match_preview = "Manchester United won their last game convincingly and have a strong home record. Chelsea, on the other hand, are struggling with injuries and have lost three of their last five away games.",
  outcome = "homewin",
  stringsAsFactors = FALSE
)

# Extracting features and assigning scores
match <- mock_data[1, ]
relations <- extract_relations(match$match_preview)

This is the output:-

image

Stanford core NLP used- stanford-corenlp-4.5.7 R version - R version 4.3.1

Is it an issue with the way CoreNLP is initialised or something else? Any help is appreciated.

Regards.

AngledLuffa commented 1 month ago

Heads up is that you need three backticks ``` to highlight a large code block, not just one.

I don't know anything about the R interface to CoreNLP. I would check the output of the interface to make sure it's actually starting CoreNLP with the OpenIE annotator as a first pass.

DeepankarVyas commented 1 month ago

Hi @AngledLuffa ,

Thanks for the heads up.

I think it is starting CoreNLP, as annotateString(text) is successfully annotating the text. It's just the triplet extraction that's creating issues. Could it be due to some missing annotators?

P.S- I manually downloaded the stanford-corenlp-4.5.7 , and can't seem to find .Properties file in the package. Not sure if that's the issue.

Regards

AngledLuffa commented 1 month ago

Sounds good. So, I would first try to check that the OpenIE model is actually part of the annotators loaded when R is creating the pipeline to interface to. It should show up in the output from the pipeline, if the R interface allows for piping the output.

Personally I have zero experience with the R interface and suggest testing that out yourself rather than relying on help from us. You could also find the authors of the R interface and ask them how to check the OpenIE package