mc2-center / mc2-center-dcc

Data coordination resources for CCKP (and MC2 in general)
0 stars 0 forks source link

Integrate metadata extraction from papers using GPT/ScholarAI #35

Open Bankso opened 7 months ago

Bankso commented 7 months ago

Per this pubpub: https://sagebionetworks.pubpub.org/pub/vh1xcgd9/release/6

Discussed with Jineta on 1.24.24 - we can use the framework described in the article linked above to request metadata extraction from papers using GPT-4.5/ScholarAI.

Proposed input: the article text, a prompt requesting metadata extraction, and a metadata template (could be JSON, CSV, etc.) Output: metadata template, populated with information extracted by the model

The article notes that scalability was not feasible at the time it was published (Nov 2023) so it will be important to consider how we can consistently and reproducibly implement this process for MC2 resource curation.