Closed cannin closed 3 months ago
@cannin @dexterpratt I got this for comparing the results gotten from the acc paper with pmid: 13086
@cannin
The statements gotten from indra was only two for this particular paper. It is shown here after i tried to format it to look like my results: https://github.com/ndexbio/gsoc_llm/blob/main/results/pmc333362/conv_indra_results.json
This is why only 2 interactions are shown in the screenshot
This is the code I used to find similarities: https://github.com/ndexbio/gsoc_llm/blob/main/python_scripts/sentence_level_comparison.py
@Favourj-bit same as on Slack. A suggestion to help you with the comparison code that builds on you you already have above. Start simple. 1 sentence at a time.
sentence|indra_subject|indra_type|indra_object|llm_...|jaccard_similarity
def jaccard_similarity(text1, text2):
# Tokenize the texts into sets of words
set1 = set(text1.split())
set2 = set(text2.split())
# Calculate the intersection and union of the two sets
intersection = set1.intersection(set2)
union = set1.union(set2)
# Compute the Jaccard similarity
if not union:
return 0.0
return len(intersection) / len(union)
# Example usage
text1 = "boy has cat"
text2 = "girl has cat"
similarity = jaccard_similarity(text1, text2)
print(f"Jaccard Similarity between '{text1}' and '{text2}': {similarity:.2f}")
# Example usage; use the table above to generate text1/2
text1 = "NS1 binds HPA-1"
text2 = "NS1 activates HPA-1"
jaccard_similarity(text1, text2)
To get more columns: recall, precision, f1:
# Inspired by: https://github.com/yuhaozhang/tacred-relation/blob/master/utils/scorer.py
# and https://github.com/hitz-zentroa/GoLLIE/blob/main/notebooks/Relation%20Extraction.ipynb
def score(observations, predictions, pad_smaller=True, verbose=False):
correct_by_relation = 0
predicted_by_relation = 0
observed_by_relation = 0
# Pad the shorter vector
if pad_smaller:
# Determine the maximum length
max_length = max(len(predictions), len(observations))
if len(predictions) < max_length:
predictions.extend([''] * (max_length - len(predictions)))
if len(observations) < max_length:
observations.extend([''] * (max_length - len(observations)))
if not pad_smaller and len(predictions) != len(observations):
print("ERROR: predictions and observations must be the same length.")
return
# Loop over the observations to compute a score
for row in range(len(observations)):
observed = observations[row]
predicted = predictions[row]
if observed == '' and predicted == '':
pass
elif observed == '' and predicted != '':
predicted_by_relation += 1
elif observed != '' and predicted == '':
observed_by_relation += 1
elif observed != '' and predicted != '':
predicted_by_relation += 1
observed_by_relation += 1
if observed == predicted:
correct_by_relation += 1
# Print the aggregate score (micro-averaged since across all relations)
prec_micro = 1.0
if predicted_by_relation > 0:
prec_micro = correct_by_relation / predicted_by_relation
recall_micro = 0.0
if observed_by_relation > 0:
recall_micro = correct_by_relation / observed_by_relation
f1_micro = 0.0
if prec_micro + recall_micro > 0.0:
f1_micro = 2.0 * prec_micro * recall_micro / (prec_micro + recall_micro)
if verbose:
print("Final Score:")
print("Precision (micro): {:.2%}".format(prec_micro))
print(" Recall (micro): {:.2%}".format(recall_micro))
print(" F1 (micro): {:.2%}".format(f1_micro))
return round(prec_micro, 3), round(recall_micro, 3), round(f1_micro, 3)
predictions = ['A_binds_B', 'A_stimulates_D', 'A_stimulates_C', 'A_activates_C', 'A_binds_E', 'A_binds_F']
observations = ['A_binds_B', 'A_stimulates_D', 'C_binds_D']
score(observations, predictions, pad_smaller=True, verbose=True)
For each sentence, compare that the interaction is the same. If subject, object, and interaction type are the same then they match and put 1 (TRUE) otherwise 0 (FALSE). Report sum(match), total interactions, and sum(match)/total (i.e., percent matching).