List of surgical tool datasets organised by task. A list of data repositories is also displayed at the bottom. Please open an issue if you see a relevant open dataset which is missing or if you find inacurate information.
Dataset | Brief description | Images | Procedures | Paper |
RMIT
|
This dataset consists of three image sequences during retinal microsurgery. For each image sequence, the instrument position and size has been hand annotated.
|
1.5K |
4 |
Sznitman et a. 2012
|
InstrumentCrowd
|
The training data was generated from a total of 6 surgical procedures, three from laparoscopic adrenalectomies and three from laparoscopic pancreatic resections. From each surgery, 20 images containing one or several medical instruments were extracted, yielding 120 images in total.
|
120 |
6 |
Maier-Hein et al. 2014
|
NeuroSurgicalTools
|
Consists of 2476 monocular images (1221 for training and 1255 for testing) coming from in vivo neurosurgeries. The resolution of the images varies from 612×460 to 1920×1080.
|
2.5K |
14 |
Bouget et al. 2015
|
EndoVis2015
|
40 2D in-vivo images from 4 laparoscopic colorectal surgeries. Each pixel is labelled as either background, shaft and manipulator (~160 2D images and annotations in total). 4x 45-second 2D images sequences of at least one Large Needle Driver instrument in an ex-vivo setup. Each pixel is labelled as either backgroud, shaft, head or clasper.
|
9K |
8 |
N/A
|
EndoVis2017
|
8x 225-frame robotic surgical videos, captured at 2 Hz, with manually labelled different tool parts and types. The testing set contains 8x 75-frame videos and 2x 300-frame videos.
|
1.8K |
8 |
Allan et al. 2019
|
EndoVis2018
|
Training dataset is made up of 16 robotic nephrectomy procedures recorded using da Vinci Xi systems in porcine labs (subsampled to 2fps). Sequences with little or no motion are manually removed to leave 149 frames per procedure. Video frames are 1280x1024 and we provide the left and right eye camera image as well as the stereo camera calibration parameters. Labels are only provided for the left image.
|
2.4K |
16 |
Allan et al. 2020
|
ROBUST-MIS2019
|
Procedures in rectal resection and proctocolectomy. A training case encompasses a 10 second video snippet in form of 250 endoscopic image frames and a reference annotation for the last frame. In the annotated frame a “0” indicates the absence of a medical instrument and numbers “1”, “2“, ... represent different instances of medical instruments.
|
10K |
30 |
Ross et al. 2020
|
Kvasir-Instrument
|
The Kvasir-Instrument dataset consists of consists of 590 annotated frames comprising of GI procedure tools such as snares, balloons, biopsy forceps, etc. The resolution of the image in the dataset varies from 720x576 to 1280x1024.
|
590 |
N/A |
Jha et al. 2020
|
CholecSeg8k
|
This dataset contains 8080 laparoscopic cholecystectomy image frames extracted and annotated from 17 video clips in Cholec80.
|
8K |
17 |
Hong et al. 2020
|
RoboTool
|
514 images extracted from the videos of 20 freely available robotic surgical procedures and annotated for binary tool-background segmentation.
|
514 |
20 |
Garcia-Peraza-Herrera et al. 2021
|
Dataset | Brief description | Images | Procedures | Paper |
JIGSAWS
|
The JIGSAWS dataset consists of three components: kinematic data (Cartesian positions, orientations, velocities, angular velocities and gripper angle describing the motion of the manipulators), video data (stereo video captured from the endoscopic camera), and manual annotations of gestures (atomic surgical activity segment labels) and skill (global rating score using modified objective structured assessments of technical skills).
|
N/A |
N/A |
Gao et al. 2014
|
Cataract-101
|
This dataset contains 101 videos of cataract surgeries annotated with two kinds of information: Anonymous ID and experience level of operating surgeon, and starting points of quasi-standardized operation phases in videos.
|
1.3M |
101 |
Schoeffmann et al. 2018
|
HeiCo
|
The data set contains of data from the ROBUST-MIS 2019 challenge and the Surgical Workflow Challenges from EndoVis 2017 and 2018.
|
10K |
30 |
Maier-Hein et al. 2020
|
MISAW
|
The data-set contains 27 micro-anastomosis training sequences and is composed of the following information: stereoscopic video, kinematic data, workflow annotation at 3 levels of granularity (phases, steps, and activities).
|
N/A |
27 |
Huaulmé et al. 2021
|
PETRAW
|
Dataset for online automatic recognition of surgical workflow by using both kinematic and stereoscopic video information on a micro-anastomosis training task.
|
N/A |
100 |
N/A
|
Dataset | Brief description | Images/Videos | Procedures | Paper |
ART-Net
|
This dataset consists non-robotic tools with annotated tool presence, tool segmentation, and instrumnt geometric primitives (mid-line, edge-line, tooltip). The images come from laparoscopic hysterectomy videos. This dataset also contains tool presence annotated for another set of 3000 images, namely 1500 positive and 1500 negative images, respectively, for which some positive images contain multiple tools. 4270 images are labelled for tool detection. If the tool shaft is not visible at all, the image is marked as negative. When a small part of the tool shaft is visible, the image is marked as positive. For segmentation and geometric primitive extraction, 635 images are annotated.
|
Different for each task |
29 |
Hasan al. 2021
|
HeiSurF
|
Surgical Workflow Analysis and Full Scene Segmentation. All surgeries were annotated framewise for surgical phases by surgical experts. Surgical actions, instrument usage and surgical skill levels were annotated. The surgeries recorded are laparoscopic gallbladder removals (cholecystectomy). The data for segmentation consists of two parts. In the first part of the training dataset, frames at 2 minute intervals from 24 operations (the same operations as for the workflow challenge) are provided. The second part of the training dataset will consist of brief sequences taken from each video, where frames will be segmented at 1fps. To ensure anonymity, frames corresponding to extra-abdominal views are censored by entirely white (RGB 255 255 255) frames. The testing dataset of 9 videos will not be released.
|
24 videos |
30 |
HeiSurf Presentation
|
AutoLaparo
|
AutoLaparo contains videos of laparoscopic hysterectomy.
Three sub-datasets are designed for the following three tasks:
surgical workflow recognition, laparoscope motion prediction, instrument and key anatomy segmentation.
The videos are recorded at 25 fps with a standard resolution of 1920×1080 pixels.
The duration of videos ranges from 27 to 112 minutes due to the varying difficulties of the surgeries. After pre-processing, the average duration is 66 minutes and the total duration is 1388 minutes.
Annotations:
- Surgical workflow recognition: the hysterectomy procedure is divided into 7 phases and each frame is annotated with a phase label.
- Laparoscope motion prediction: 300 clips are carefully selected from Phase 2-4 of the 21 videos and each clip lasts for 10 seconds.
Seven types of motion modes are defined, including one Static mode and six non-static mode: Up, Down, Left, Right, Zoom-in, and Zoom-out.
- Instrument and key anatomy segmentation: for each clip in the motion prediction task, six frames are sampled at 1fps, and annotated with pixel-wise segmentation. Four types of instruments and one key anatomy is annotated in the dataset: grasping forceps, LigaSure, dissecting and grasping forceps, electric hook, uterus.
|
Different for each task |
21 |
Wang et al. 2022
|
SurgToolLoc
|
This dataset contains clips of surgical training exercises using the da Vinci robotic system.
In them, trainees perform standard activities such as dissecting tissue and suturing.
There are 24,695 video clips, each 30 seconds long and captured at 60 fps with a resolution of 1280x720 pixels.
- Training data: for each 30-second clip within the training set, just tool presence labels indicating which robotic tools are
installed are provided. For the extent of each clip, the same three tools (out of 14 possible) are installed.
However, some may be obscured or temporarily invisible, i.e. there is noise in the tool presence labels of the training set.
- Testing data:
The test has tool presence labels and also bounding boxes around the robotic tools. The videos are sampled at 1Hz.
|
741K (videos sampled at 1Hz) |
N/A |
N/A
|
SAR-RARP50
|
SAR-RARP50 is a multitask dataset that provides action recognition and surgical instrumentation segmentation labels for video segments recorded during 50 Robot-Assisted Radical Prostatectomies (RARP).
The operations were performed by 8 surgeons with different surgical seniority (experienced consultant, senior registrar, and junior registrar).
The selected segments focus on the suturing of the dorsal vascular complex (DVC), an array of veins and arteries that is sutured to keep bleeding under control after the connection of the prostate to bladder and urethra is cut.
Surgical operations were performed using a DaVinci Si robot, recording at 60 frames per second in 1080i resolution stereo video format.
After data acquisition, the stereo video channels were time-synchronized and de-interlaced.
The 50 videos are grouped into 2 sets with balanced class proportions, one set for training (40 interventions) and one for testing (10 interventions).
Class actions: (0, other), (1, picking up the needle), (2, positioning the needle tip), (3, pushing the needle through the tissue), (4, pulling the needle out of the tissue), (5, tying a knot), (6, cutting the suture), (7, returning/dropping the needle).
Annotators: the action gesture classes were decided in collaboration with an expert surgeon and annotations were manually generated by an engineer with experience in surgical action recognition. During the data labelling process, the annotator was instructed to assign only one class per frame, choosing from a list of predefined actions.
The action recognition labels may include imprecision in the gesture boundary or action ambiguities linked to non-standard surgical gestures and the particularities of each surgeon's technique.
The tool segmentations provided are for the left camera view of a stereo endoscope for all 50 RARP pressures at a rate of 1Hz.
Semantic information is provided in png format with pixel values corresponding to a different class. The association between pixel values and semantic classes is the following: (1, tool clasper), (2, tool wrist), (3, tool shaft), (4, suturing needle), (5, thread), (6, suction tool), (7, needle holder), (8, clamps), (9, catheter).
The tool segmentation annotations were generated by non-medical, professional annotators and were validated independently by the organizers of the challenge.
Segmentation annotations may include inaccuracies when: videos are not in focus, camera lenses are not clean, objects are moving fast (resulting in ghosting), there are video compression artifacts, surgical instrumentation is not fully visible, areas are not brightly lit.
|
10K |
50 |
Psychogyios et al. 2023
|
Dataset | Brief description | Images | Procedures | Paper |
Dresden Surgical Anatomy Dataset
|
The Dresden Surgical Anatomy Dataset provides semantic segmentations of eight abdominal organs (colon, liver, pancreas, small intestine, spleen, stomach, ureter, vesicular glands), the abdominal wall and two vessel structures (inferior mesenteric artery, intestinal veins) in laparoscopic view.
The majority of patients (26/32) were male, the overall average age was 63 years and the mean body mass index (BMI) was 26.75 kg/m2 (Table 1).
All included patients had a clinical indication for the surgical procedure. Surgeries were performed using a standard Da Vinci® Xi/X Endoscope
with Camera (8 mm diameter, 30° angle, Intuitive Surgical, Item code 470057) and recorded using the CAST-System (Orpheus Medical GmbH,
Frankfurt a.M., Germany). Each record was saved at a resolution of 1920x1080 pixels in MPEG-4 format and lasts between about two and ten hours.
|
13K |
32 |
Carstens et al. 2023
|
SurgAI3.8K
|
The dataset contains the following annotations: uterus segmentation, uterus contours and the regions of the left and right fallopian tube junctions.
|
3.8K |
79 |
Zadeh et al. 2023
|