related-sciences / nxontology-ml

Machine learning to classify ontology nodes
Apache License 2.0
6 stars 0 forks source link

Experiment with LLM-derived EFO term classifications #3

Closed eric-czech closed 1 year ago

eric-czech commented 1 year ago

refs: #2

This is one way we could prompt an LLM to classify terms for us:

For convenience, here is a template for that prompt without actual records to classify:

A list of records will be provided from an ontology of disease terms. Each record will contain information describing a single term.

Assign a `precision` label to each of these terms that captures the extent to which they correspond to patient populations with distinguishing clinical, demographic, physiological or molecular characteristics. Use exactly one of the following values for this label:

- `high`: High precision terms have the greatest ontological specificity, sometimes (but not necessarily) correspond to small groups of relatively homogeneous patients, often have greater diagnostic certainty and typically represent the forefront of clinical practice.
- `medium`: Medium precision terms are the ontological ancestors of `high` precision terms (if any are known), often include indications in later stage clinical trials and generally reflect groups of patients assumed to be suffering from a condition with a shared, or at least similar, physiological or environmental origin.
- `low`: Low precision terms are the ontological ancestors of both `medium` and `high` precision terms, group collections of diseases with *some* shared characteristics and typically connote a relatively heterogenous patient population. They are often terms used within the ontology for organizational purposes.

The records provided will already have the following fields:

- `id`: A string identifier for the term
- `label`: A descriptive name for the term
- `description`: A longer, possibly truncated description of what the term is; may be NA (i.e. absent)

Here is a list of such records (in YAML format) where the `precision` label is already assigned for 3 examples at each level of precision:

- id: EFO:1000639
  label: acquired metabolic disease
  definition: A disease of metabolism that has _material_basis_in enzyme deficiency or accumulation of enzymes or toxins which interfere with normal function due to an endocrine organ disease, organ malfunction, inadequate intake, dietary deficiency, or ...
  precision: low
- id: Orphanet:68336
  label: Rare genetic tumor
  definition: NA
  precision: low
- id: EFO:0005548
  label: developmental disorder of mental health
  definition: A disease of mental health that occur during a child’s developmental period between birth and age 18 resulting in retarding of the child’s
  precision: low
- id: EFO:0005548
  label: inflammatory bowel disease
  definition: A spectrum of small and large bowel inflammatory diseases of unknown etiology. It includes Crohn's disease, ulcerative colitis, and colitis of indeterminate type.
  precision: medium
- id: EFO:0000384
  label: Crohn's disease
  definition: A gastrointestinal disorder characterized by chronic inflammation involving all layers of the intestinal wall, noncaseating granulomas affecting the intestinal wall and regional lymph nodes, and transmural fibrosis. Crohn disease most ...
  precision: medium
- id: MONDO:0045020
  label: glycine metabolism disease
  definition: A disease that has its basis in the disruption of glycine metabolic process.
  precision: medium
- id: EFO:1000277
  label: Gastric Small Cell Neuroendocrine Carcinoma
  definition: An aggressive, high-grade and poorly differentiated carcinoma with neuroendocrine differentiation that arises from the stomach. It is characterized by the presence of malignant small cells.
  precision: high
- id: MONDO:0015634
  label: isolated osteopoikilosis
  definition: A osteopoikilosis (disease) that is not part of a larger syndrome.
  precision: high
- id: Orphanet:98755
  label: Spinocerebellar ataxia type 1
  definition: Spinocerebellar ataxia type 1 (SCA1) is a subtype of type I autosomal dominant cerebellar ataxia (ADCA type I; see this term) characterized by dysarthria, writing difficulties, limb ataxia, and commonly nystagmus and saccadic abnormalities.
  precision: high

Here are the records for which this `precision` label is not yet known:



- Assign a `precision` label for ALL records
- Respond in CSV format using a pipe (i.e. "|") delimiter with the headers `id`, `precision` where `id` is the `id` associated with each record
- Include the headers in the result 
- Respond with ONLY the CSV content, do not include explanation of any kind


Some questions to answer related to this:

For posterity, I used this pool of past records + labels to draw on for those few-shot examples:

EFO example records ```yaml - id: Orphanet:309331 label: Intermediate severe Salla disease definition: NA precision: high - id: MONDO:0001674 label: diverticulitis of colon definition: Inflammation of the colonic diverticula, generally with abscess formation and subsequent perforation. precision: high - id: MONDO:0014498 label: familial cold autoinflammatory syndrome 4 definition: Any familial cold autoinflammatory syndrome in which the cause of the disease is a mutation in the NLRC4 gene. precision: high - id: MONDO:0001748 label: maxillary sinus carcinoma definition: A carcinoma that arises from the maxillary sinus. Representative examples include squamous cell carcinoma, adenocarcinoma, and adenoid cystic carcinoma. precision: high - id: MONDO:0008877 label: blue diaper syndrome definition: Blue Diaper syndrome is a hereditary metabolic disorder characterised by hypercalcaemia with nephrocalcinosis and indicanuria. precision: high - id: MONDO:0100383 label: acute myeloid leukemia, t(11;19)(q23;p13) definition: Any acute myeloid leukemia that has the chromosomal anomaly t(11;19)(q23;p13). (A cytogenetic abnormality that refers to the translocation of the long arm (q23) of chromosome 11 and the short arm (p13) of chromosome 19. It is associated with KMT2A ... precision: high - id: MONDO:0020307 label: benign childhood occipital epilepsy, Panayiotopoulos type definition: Benign childhood occipital epilepsy, Panayiotopoulos type is a rare, genetic neurological disorder characterized by late infancy to early-adolescence onset of prolonged, nocturnal seizures which begin with autonomic features (e.g. vomiting, pallor, ... precision: high - id: Orphanet:93337 label: Polydactyly of an index finger definition: Polydactyly of an index finger or PPD3 is a form of preaxial polydactyly of fingers (see this term), a limb malformation syndrome, where the thumb is replaced by one or two triphalangeal digits with dermatoglyphic pattern specific of the index ... precision: high - id: MONDO:0013533 label: hyperlipidemia due to hepatic triglyceride lipase deficiency definition: Hyperlipidemia due to hepatic triacylglycerol lipase deficiency is a rare, genetic hyperalphalipoproteinemia characterized by elevated plasma cholesterol and triglyceride (TG) levels with a marked TG enrichment of low- and high-density lipoproteins ... precision: high - id: MONDO:0015634 label: isolated osteopoikilosis definition: A osteopoikilosis (disease) that is not part of a larger syndrome. precision: high - id: MONDO:0009579 label: Frank-Ter Haar syndrome definition: Frank-ter Haar syndrome (formerly considered as an autosomal recessive form of Melnick-Needles syndrome) is defined by megalocornea, multiple skeletal anomalies, characteristic facial dysmorphism (wide fontanels, prominent forehead, hypertelorism, ... precision: high - id: EFO:0009301 label: dystonia 28, childhood-onset definition: Any dystonic disorder in which the cause of the disease is a mutation in the KMT2B gene. precision: high - id: Orphanet:85197 label: Genochondromatosis type 1 definition: NA precision: high - id: Orphanet:352328 label: MEGDEL syndrome definition: NA precision: high - id: EFO:1000277 label: Gastric Small Cell Neuroendocrine Carcinoma definition: An aggressive, high-grade and poorly differentiated carcinoma with neuroendocrine differentiation that arises from the stomach. It is characterized by the presence of malignant small cells. precision: high - id: MONDO:0013017 label: hypotrichosis 5 definition: A hypotrichosis that has material basis in a mutation on chromosome 1p21.1-q21.3. precision: high - id: Orphanet:262749 label: Partial duplication of the short arm of chromosome 7 definition: NA precision: high - id: Orphanet:73245 label: Spinal muscular atrophy - Dandy-Walker malformation - cataracts definition: NA precision: high - id: MONDO:0007251 label: campomelic dysplasia definition: Campomelic dysplasia is a very rare disorder characterised by a variable association of skeletal abnormalities (bowed and fragile long bones, pelvis and chest abnormalities, eleven rib pairs instead of the usual twelve), and extraskeletal ... precision: high - id: Orphanet:98755 label: Spinocerebellar ataxia type 1 definition: Spinocerebellar ataxia type 1 (SCA1) is a subtype of type I autosomal dominant cerebellar ataxia (ADCA type I; see this term) characterized by dysarthria, writing difficulties, limb ataxia, and commonly nystagmus and saccadic abnormalities. precision: high - id: Orphanet:217587 label: Mitochondrial disease with hypertrophic cardiomyopathy definition: NA precision: medium - id: MONDO:0004630 label: substance-induced psychosis definition: NA precision: medium - id: MONDO:0010412 label: X-linked intellectual disability-craniofacioskeletal syndrome definition: X-linked intellectual disability-craniofacioskeletal syndrome is a rare, hereditary, syndromic intellectual disability characterized by craniofacial and skeletal abnormalities in association with mild intellectual disability in females and early ... precision: medium - id: EFO:0002627 label: vulvar intraepithelial neoplasia definition: Intraepithelial neoplasia of the vulvar squamous epithelium. There is no evidence of invasion. This category includes vulvar high grade squamous intraepithelial lesion and vulvar intraepithelial neoplasia, differentiated type. precision: medium - id: MONDO:0045020 label: glycine metabolism disease definition: A disease that has its basis in the disruption of glycine metabolic process. precision: medium - id: EFO:1000433 label: Ovarian Steroid Cell Tumor definition: An ovarian tumor in which the vast majority of the cells (more than 90% of the tumor cells) resemble steroid hormone-secreting cells. It usually presents with androgenic manifestations. Approximately one-third of the cases follow a malignant ... precision: medium - id: Orphanet:60041 label: Congenital heart block definition: NA precision: medium - id: EFO:0005549 label: diphtheria definition: A Gram-positive bacterial infection caused by Corynebacterium diphtheriae. It usually involves the oral cavity, pharynx, and nasal cavity. Patients develop pseudomembranes in the affected areas and manifest signs and symptoms of an upper respiratory ... precision: medium - id: MONDO:0000009 label: inherited bleeding disorder, platelet-type definition: NA precision: medium - id: MONDO:0001029 label: Klippel-Feil syndrome definition: A congenital, musculoskeletal condition characterized by the fusion of at least two vertebrae of the neck. Common symptoms include a short neck, low hairline at the back of the head, and restricted mobility of the upper spine. This syndrome can ... precision: medium - id: MONDO:0011063 label: hidrotic ectodermal dysplasia, Christianson-Fourie type definition: Hidrotic ectodermal dysplasia, Christianson-Fourie type is a rare ectodermal dysplasia syndrome characterized by tricho- and onychodysplasia in association with cardiac rhythm abnormalities. Patients present with sparse scalp hair and eyelashes, ... precision: medium - id: MONDO:0019427 label: X-linked neurodegenerative syndrome, Bertini type definition: X-linked neurodegenerative syndrome, Bertini type is characterised by generalised hypotonia, psychomotor deficit, congenital ataxia and recurrent bronchopulmonary infections. It has been described in seven males from three generations of a family. ... precision: medium - id: EFO:0009011 label: Arteritis definition: An inflammatory process affecting an artery. precision: medium - id: MONDO:0019910 label: maternal uniparental disomy of chromosome 2 definition: Maternal uniparental disomy of chromosome 2 is an uniparental disomy of maternal origin that most likely does not have any phenotypic expression except from cases of homozygosity for a recessive disease mutation for which only mother is a carrier. precision: medium - id: MONDO:0022749 label: non-neoplastic nevus definition: A abnormal, congenital formation or mark on the skin or neighboring mucosa that does not show neoplastic growth. precision: medium - id: MONDO:0044877 label: paraneoplastic cerebellar degeneration definition: A rare, immune-mediated disorder characterized by cerebellar degeneration due to the presence of an often undetected malignancy (usually carcinoma or lymphoma) in an anatomic site other than the cerebellum. Signs and symptoms include progressive ... precision: medium - id: EFO:1000132 label: Bone Epithelioid Hemangioma definition: A bone hemangioma characterized by the presence of epithelioid endothelial cells, and eosinophilic and lymphocytic infiltrates. precision: medium - id: EFO:0007510 label: tinea definition: A cutaneous mycosis that results_in fungal infection located_in skin, located_in hair, and located_in nail, has_material_basis_in Epidermophyton, has_material_basis_in Microsporum, or has_material_basis_in Trichophyton, which invade the dead keratin ... precision: medium - id: Orphanet:1566 label: Dandy-Walker malformation - postaxial polydactyly definition: Dandy-Walker malformation with postaxial polydactyly syndrome is a syndromic disorder with, as a major feature, the association between Dandy-Walker malformation and postaxial polydactyly. The Dandy-Walker malformation has a variable expression and ... precision: medium - id: MONDO:0010088 label: mucosulfatidosis definition: Multiple sulfatase deficiency (MSD) is a very rare and fatal lysosomal storage disease characterized by a clinical phenotype that combines the features of different sulfatase deficiencies (whether lysosomal or not) that can have neonatal (most ... precision: medium - id: MONDO:0003393 label: thymus gland disorder definition: A non-neoplastic or neoplastic disorder that affects the thymus. Representative examples include thymic hyperplasia, thymoma, and thymic carcinoma. precision: low - id: Orphanet:68336 label: Rare genetic tumor definition: NA precision: low - id: EFO:0005751 label: eye allergy definition: An allergic disease involving a pathogenic inflammatory response in the camera-type eye. precision: low - id: EFO:0005548 label: developmental disorder of mental health definition: A disease of mental health that occur during a child’s developmental period between birth and age 18 resulting in retarding of the child’s psychological or physical development. precision: low - id: MONDO:0003277 label: malignant ear neoplasm definition: A malignant neoplasm that affects the ear. Representative examples include ceruminous adenocarcinoma and squamous cell carcinoma of the external ear and adenocarcinoma of the middle ear. precision: low - id: MONDO:0003686 label: apocrine sweat gland neoplasm definition: A benign or malignant sweat gland neoplasm with apocrine differentiation. Representative examples include apocrine adenoma, ceruminous adenocarcinoma, and apocrine breast carcinoma. precision: low - id: EFO:0000272 label: astrocytoma definition: A glial tumor of the brain or spinal cord showing astrocytic differentiation. It includes the following clinicopathological entities: pilocytic astrocytoma, diffuse astrocytoma, anaplastic astrocytoma, pleomorphic xanthoastrocytoma, subependymal ... precision: low - id: EFO:0005803 label: hematologic disease definition: A disease involving the hematopoietic system. precision: low - id: MONDO:0003111 label: gastric neuroendocrine neoplasm definition: A neoplasm with neuroendocrine differentiation that arises from the stomach. It includes well differentiated neuroendocrine tumors (low and intermediate grade) and poorly differentiated neuroendocrine carcinomas (high grade). precision: low - id: MONDO:0024239 label: congenital anomaly of cardiovascular system definition: A disease that has its basis in the disruption of cardiovascular system development. precision: low - id: MONDO:0019214 label: inborn carbohydrate metabolic disorder definition: An acquired metabolic disease that is has its basis in the disruption of carbohydrate metabolic process. precision: low - id: MONDO:0019054 label: congenital limb malformation definition: NA precision: low - id: EFO:0007352 label: lymphatic system disease definition: A disease involving the lymphatic part of lymphoid system. precision: low - id: EFO:1000639 label: acquired metabolic disease definition: A disease of metabolism that has _material_basis_in enzyme deficiency or accumulation of enzymes or toxins which interfere with normal function due to an endocrine organ disease, organ malfunction, inadequate intake, dietary deficiency, or ... precision: low - id: MONDO:0002279 label: iron metabolism disease definition: Disorders in the processing of iron in the body: its absorption, transport, storage, and utilization. precision: low - id: MONDO:0015482 label: otomandibular dysplasia definition: NA precision: low - id: Orphanet:371436 label: Genetic neurovascular malformation definition: NA precision: low - id: MONDO:0019737 label: thrombotic microangiopathy definition: The syndromes of microangiopathic hemolytic anemia, thrombocytopenia, and variable signs of organ impairment, due to platelet aggregation in the microcirculation. precision: low - id: MONDO:0040677 label: invasive carcinoma definition: A carcinoma that is not confined to the epithelium, and has spread to the surrounding stroma. precision: low ```