related-sciences / nxontology-ml

Machine learning to classify ontology nodes
Apache License 2.0
6 stars 0 forks source link

How to handle non diseases in EFO? #50

Open dhimmel opened 10 months ago

dhimmel commented 10 months ago

explodes https://github.com/related-sciences/nxontology-ml/issues/35#issuecomment-1766560619

We currently ignore non-diseases, by only training and predicting on terms that are diseases as per get_disease_nodes.

Our training labels only apply to diseases. Therefore, I think it makes sense to continue training only on diseases. However, there is the possibility that we could:

  1. create an is_disease marker column that is part of the output
  2. compute features for non-diseases
  3. compute predictions for non-diseases

While predictions on non-diseases would likely be of lower quality due to the lack of training coverage, many of the same concepts of grouping terms versus more specific terms would still apply. Users could then decide to discard all predictions when is_disease is False to continue with the current behavior.

There could be a benefit to having precision predictions for non-diseases. For example, classifications of symptoms (one example being pain) would make sense along a precision axis:

image

@yonromai: I'll bring this up with the data team at our next meeting, so no need to do anything until then. CC @eric-czech

dhimmel commented 10 months ago

We were missing injury, poisoning or other complication as a non-disease TA (fixed in https://github.com/related-sciences/nxontology-ml/commit/1b5923314f880818485aacecbc0b544679a9f0eb). However, this does let us see what some classifications are like on non-disease terms:

Expand for table | efo_otar_slim_id | efo_label | class_new | |:-------------------|:------------------------------------------------------|:-------------------| | EFO:0010686 | muscle strain | 02-disease-root | | EFO:0010725 | aseptic loosening | 02-disease-root | | EFO:0010581 | organophosphate poisoning | 02-disease-root | | EFO:0011061 | toxicity | 03-disease-area | | EFO:0020910 | thermal burn | 02-disease-root | | EFO:0020930 | immune-mediated adverse reaction | 02-disease-root | | EFO:0600078 | Achilles tendon injury | 01-disease-subtype | | EFO:0000546 | injury | 03-disease-area | | EFO:0002687 | ischemia reperfusion injury | 02-disease-root | | MONDO:0037747 | spinal injury | 01-disease-subtype | | MONDO:0700220 | disease related to transplantation | 03-disease-area | | MONDO:0700222 | disease related to hematopoietic stem cell transplant | 03-disease-area | | OTAR:0000009 | injury, poisoning or other complication | 03-disease-area | | MONDO:0800373 | carbon monoxide poisoning | 03-disease-area | | EFO:0007430 | persian gulf syndrome | 02-disease-root | | EFO:0009485 | eye injury | 01-disease-subtype | | EFO:0009518 | complication | 03-disease-area | | EFO:0009582 | sprain | 02-disease-root | | EFO:0009508 | leg injury | 02-disease-root | | EFO:0009574 | intoxication | 02-disease-root | | EFO:0009503 | caustic injury | 02-disease-root | | EFO:0009565 | radiation-induced disorder | 03-disease-area | | EFO:0009504 | crush injury | 02-disease-root | | EFO:0009816 | perineal laceration during delivery | 02-disease-root | | EFO:0009516 | burn | 03-disease-area | | EFO:0009509 | limb injury | 03-disease-area | | EFO:0009658 | adverse effect | 01-disease-subtype | | EFO:0009434 | death by undetermined cause | 01-disease-subtype | | EFO:0009521 | dislocation | 02-disease-root | | EFO:0009887 | intrathoracic organ injury | 02-disease-root | | EFO:0009506 | heart injury | 02-disease-root | | EFO:0008546 | poisoning | 03-disease-area | | EFO:0009507 | knee injury | 02-disease-root | | EFO:0009527 | frostbite | 02-disease-root | | EFO:0009888 | trauma complication | 01-disease-subtype | | EFO:0009623 | nose injury | 02-disease-root | | EFO:0009502 | abdominal injury | 02-disease-root | | EFO:0009525 | foreign body | 03-disease-area | | EFO:0009476 | neck injury | 02-disease-root | | EFO:0009519 | device complication | 02-disease-root | | EFO:0009833 | kidney injury | 02-disease-root | | EFO:0009505 | head injury | 02-disease-root | | MONDO:0019088 | post-transplant lymphoproliferative disease | 01-disease-subtype | | EFO:1001291 | ciguatera poisoning | 02-disease-root | | EFO:1001788 | Eye Burns | 02-disease-root | | EFO:1001328 | fluoride poisoning | 02-disease-root | | EFO:1001518 | heavy metal poisoning | 03-disease-area | | EFO:1001756 | Acrodynia | 02-disease-root | | EFO:1001373 | Multiple Organ Failure | 02-disease-root | | EFO:1001768 | cadmium poisoning | 02-disease-root | ​

Many make sense, but some seem off like "spinal injury" and "adverse effect" as high precision. So this makes me reconsider whether it entirely makes sense to apply the model outside of its trained domain.