ndexbio / gsoc_llm

GSOC 2024 LLM Project
MIT License
0 stars 0 forks source link

Compare INDRA and LLM Interactions and Report Percent Matching #16

Open cannin opened 1 month ago

cannin commented 1 month ago

For each sentence, compare that the interaction is the same. If subject, object, and interaction type are the same then they match and put 1 (TRUE) otherwise 0 (FALSE). Report sum(match), total interactions, and sum(match)/total (i.e., percent matching).

Favourj-bit commented 1 week ago
Screenshot 2024-06-29 at 01 49 24

@cannin @dexterpratt I got this for comparing the results gotten from the acc paper with pmid: 13086

Favourj-bit commented 1 week ago

@cannin

The statements gotten from indra was only two for this particular paper. It is shown here after i tried to format it to look like my results: https://github.com/ndexbio/gsoc_llm/blob/main/results/pmc333362/conv_indra_results.json

This is why only 2 interactions are shown in the screenshot

Favourj-bit commented 1 week ago

This is the code I used to find similarities: https://github.com/ndexbio/gsoc_llm/blob/main/python_scripts/sentence_level_comparison.py

cannin commented 1 week ago

@Favourj-bit same as on Slack. A suggestion to help you with the comparison code that builds on you you already have above. Start simple. 1 sentence at a time.

Steps

  1. Get 100 sentences from "text" fields of INDRA results
  2. Send 1 sentence to INDRA REACH: https://gist.github.com/cannin/949db7c5a0bbce873704fe90cc58de5a#file-use_indra_run_export_reach-py
  3. Send the same sentence to LLM
  4. Compare. Let's start with Jaccard (see below). Fill in 'jaccard_similarity' for each interaction.
  5. Report the average of the jaccard_similarity over the 100 sentences.

Example Table for Comparison:

sentence|indra_subject|indra_type|indra_object|llm_...|jaccard_similarity

Example Code

def jaccard_similarity(text1, text2):
    # Tokenize the texts into sets of words
    set1 = set(text1.split())
    set2 = set(text2.split())

    # Calculate the intersection and union of the two sets
    intersection = set1.intersection(set2)
    union = set1.union(set2)

    # Compute the Jaccard similarity
    if not union:
        return 0.0
    return len(intersection) / len(union)

# Example usage
text1 = "boy has cat"
text2 = "girl has cat"

similarity = jaccard_similarity(text1, text2)
print(f"Jaccard Similarity between '{text1}' and '{text2}': {similarity:.2f}")

# Example usage; use the table above to generate text1/2
text1 = "NS1 binds HPA-1"
text2 = "NS1 activates HPA-1"

jaccard_similarity(text1, text2)
cannin commented 6 days ago

To get more columns: recall, precision, f1:

# Inspired by: https://github.com/yuhaozhang/tacred-relation/blob/master/utils/scorer.py
# and https://github.com/hitz-zentroa/GoLLIE/blob/main/notebooks/Relation%20Extraction.ipynb 
def score(observations, predictions, pad_smaller=True, verbose=False):
    correct_by_relation = 0
    predicted_by_relation = 0
    observed_by_relation = 0

    # Pad the shorter vector
    if pad_smaller: 
        # Determine the maximum length
        max_length = max(len(predictions), len(observations))

        if len(predictions) < max_length:
            predictions.extend([''] * (max_length - len(predictions)))
        if len(observations) < max_length:
            observations.extend([''] * (max_length - len(observations)))

    if not pad_smaller and len(predictions) != len(observations):
        print("ERROR: predictions and observations must be the same length.")
        return

    # Loop over the observations to compute a score
    for row in range(len(observations)):
        observed = observations[row]
        predicted = predictions[row]

        if observed == '' and predicted == '':
            pass
        elif observed == '' and predicted != '':
            predicted_by_relation += 1
        elif observed != '' and predicted == '':
            observed_by_relation += 1
        elif observed != '' and predicted != '':
            predicted_by_relation += 1
            observed_by_relation += 1
            if observed == predicted:
                correct_by_relation += 1

    # Print the aggregate score (micro-averaged since across all relations)
    prec_micro = 1.0
    if predicted_by_relation > 0:
        prec_micro = correct_by_relation / predicted_by_relation
    recall_micro = 0.0
    if observed_by_relation > 0:
        recall_micro = correct_by_relation / observed_by_relation
    f1_micro = 0.0
    if prec_micro + recall_micro > 0.0:
        f1_micro = 2.0 * prec_micro * recall_micro / (prec_micro + recall_micro)
    if verbose:
        print("Final Score:")
        print("Precision (micro): {:.2%}".format(prec_micro))
        print("   Recall (micro): {:.2%}".format(recall_micro))
        print("       F1 (micro): {:.2%}".format(f1_micro))

    return round(prec_micro, 3), round(recall_micro, 3), round(f1_micro, 3)

predictions = ['A_binds_B', 'A_stimulates_D', 'A_stimulates_C', 'A_activates_C', 'A_binds_E', 'A_binds_F']
observations = ['A_binds_B', 'A_stimulates_D', 'C_binds_D']
score(observations, predictions, pad_smaller=True, verbose=True)