jbdel / vilmedic

ViLMedic (Vision-and-Language medical research) is a modular framework for vision and language multimodal research in the medical field
MIT License
151 stars 20 forks source link
News Papers
Toward Expanding the Scope of Radiology Report Summarization to Multiple Anatomies and Modalities Dataset
Overview of the RadSum23 Shared Task on Multi-modal and Multi-anatomical Radiology Report Summarization Challenge
Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards Replicate

ViLMedic: a framework for research at the intersection of vision and language in medical AI

ViLMedic has a dedicated website at: https://vilmedic.app/



MIT License


@inproceedings{delbrouck-etal-2022-vilmedic,
    title = "{V}i{LM}edic: a framework for research at the intersection of vision and language in medical {AI}",
    author = "Delbrouck, Jean-benoit  and
      Saab, Khaled  and
      Varma, Maya  and
      Eyuboglu, Sabri  and
      Chambon, Pierre  and
      Dunnmon, Jared  and
      Zambrano, Juan  and
      Chaudhari, Akshay  and
      Langlotz, Curtis",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-demo.3",
    pages = "23--34",
}

Quickstart and documentation

Rendez-vous at: https://vilmedic.app/installation/

Implemented solutions

ViLMedic replicates solutions from the multimodal medical literature.

Solutions
Medical Visual Question Answering
SYSU-HCP at VQA-Med 2021
Radiology report generation
Generating Radiology Reports via Memory-driven Transformer
Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports
Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation
Radiology report summarization
Multimodal Radiology Report Summarization
Multimodal self-supervised Learning
Contrastive Learning of Medical Visual Representations from Paired Images and Text
DALLE: Zero-Shot Text-to-Image Generation
CLIP: Learning Transferable Visual Models From Natural Language Supervision
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition

Blocks

Blocks
Natural Language Processing
HuggingFace transformer encoder and decoder
HuggingFace transformer beam-search and model ensembling :fire:
NLG metrics (BLEU, ROUGE, METEOR, MAUVE) and Radiology Reports Generation metrics (F1-CheXbert)
RadGraph
Vision
All PyTorch VisualEncoder architectures
Vision Transformer
TorchXRayVision
Losses
All PyTorch losses
ConVirt loss
GLoRIA loss
InfoNCE loss
SuperLoss
Reinforcement Learning
Self-critical Sequence Training (HuggingFace compliant) :fire:
PPO optimization (HuggingFace compliant)

Citation

If you use ViLMedic in your work or use any models published in ViLMedic, please cite:

License

ViLMedic is MIT-licensed. The license applies to the pre-trained models as well.