ndexbio / gsoc_llm

GSOC 2024 LLM Project
MIT License
0 stars 0 forks source link

Work with ACC papers #17

Closed Favourj-bit closed 5 hours ago

Favourj-bit commented 2 weeks ago

@cannin

Please for the paper I am supposed to work with, where would i get the indra result in order to compare with?

Also, please when searching, i used pmid in front of the number because without doing this, the number just directs me to a gene on the nih website. I wanted to confirm if I am searching correctly. I got the result below

Screenshot 2024-06-19 at 06 47 53
Favourj-bit commented 2 weeks ago

@cannin @dexterpratt

I tried to extract the xml of some pmids listed in the text document of the acc zipped file mentioned in #2 and I keep on getting the error attached in the screenshot

Screenshot 2024-06-19 at 09 34 24
Favourj-bit commented 2 weeks ago

@cannin @dexterpratt

I already tried the paper gotten from PMC333362, PMID: 13086 which was gotten from the txt pmids file. I extracted the text using the read_pdf function I started out with. I already have some results which I have pushed to the repo. I also tried out gpt-4, gpt-4-turbo and gpt-4o for the paper and I have taken notes of the time differences for running the code using each of these models.

I noticed that with gpt-4, the model hallucinated and added the examples I showed it in the prompt even when it wasn't part of the paper, this made me to refine my prompt to specifically tell it to only use those examples to see how to structure the results.

cannin commented 2 weeks ago

Access the INDRA statements for a specific publication:

Option 1

from indra.sources import indra_db_rest
ip = indra_db_rest.get_statements_for_papers([('pmid', '27153756')])

Option 2

curl -X POST https://db.indra.bio/query/statements \
     -H "Content-Type: application/json" \
     -d '{"query": {"class": "FromPapers", "constraint": {"paper_list": [["pmid", "27153756"]]}, "inverted": false}}'
Favourj-bit commented 2 weeks ago

This a Pipfile for use with pipenv

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]
ipython = "*"

[packages]
requests = "*"
tqdm = "*"
jsonpath-ng = "*"
pyjnius = ">=1.3.0"
indra = {git = "https://github.com/sorgerlab/indra.git", editable = true, ref = "8919f134bbcdb08bd0dc288fe8b6b79a4f6acc94"}

[requires]
python_version = "3.10"
Favourj-bit commented 6 days ago

Hi @cannin @dexterpratt

I was able to install and configure indra. Then, i tried to get the indra statements for the paper with pmcid: PMC333362.

However, I seem to be getting very few statements as compared to when I use my gpt extraction code and this is confusing me.

Favourj-bit commented 6 days ago

this link contains the results: https://github.com/ndexbio/gsoc_llm/blob/main/results.json

and this is the code used: https://github.com/ndexbio/gsoc_llm/blob/main/python_scripts/get_indra_statements.py