Closed cannin closed 3 months ago
Another document to try out that came up during the kickoff meeting
@Favourj-bit That article happened to be in the examples; this is not an important for this project. If you are in need more articles, use the ones in the zip file (PMIDs provided); this is related to #7
indra_adrenocortical_carcinoma_v2.zip
Another document to try out that came up during the kickoff meeting
@cannin Alright, noted. Thank you
Hello @cannin, so i tested out langchain on the paper and I was able to parse through the whole document. There is just something i need clarity on. I tried to get the number of interactions and number of unique interactions, I don't know if I should be getting unique interactions since all interactions are supposed to be a different one
The first result is what I got without specifying anything about uniqueness and it reports 158 interactions The other one is what I got when I tried to specify uniqueness.
I have attached the json files that shows the different outputs here: https://github.com/ndexbio/gsoc_llm/tree/094a972ab9bb84009db73b1cc84721fc6337a6ff/results/SIRT1_PARP1
@Favourj-bit as discussed the review article is very dense with interactions; try working with PMC6044858 first
@cannin , I have been able to test the other paper you suggested using gpt. However, I am having issues with coming up with code to compare the both because they seem to be represented a little differently. I don't know if you might have any suggestion for me to write a comparison code. I have attached some screenshots that shows what i am talking about
No easy fix for you, but that's why I said use/make a format compatible with both NDEX and INDRA (see #7). Remember, important to know what matches or not; not just an overall X INDRA and Y GPT counts.
@cannin I have been able to extract interactions from this document based on sentences. I noticed that the extraction chain does not still extract the sentence as the interaction details upon directly specifying it in the prompt and in the schematic. This is the result from the code: https://github.com/ndexbio/gsoc_llm/blob/main/results/pmc6044858/sentence_output.json
Don't trust or ask GPT to get the sentence. Get the result, if it is JSON, then dump it into a dictionary variable in Python and add the sentence.
This review is a good test case because it has many interactions ~ 90 interactions
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3898398/
The "Additional file 2" (at bottom) has the interactions included in the publication. Work to retrieve as many as possible and report how many you are retrieving. I expect you try to get at least ~70 interactions before reporting.