vcheplygina / crowdskin

Skin lesion annotations via crowdsourcing
0 stars 0 forks source link

This code processes annotations for ISIC 2017 lesion images from previous years, and assigns unannotated images to the new groups. annotation_processing_script.py calls all the functions that are neccesary to assign the images to the new groups. The following variables need to be set by the user:

Requirements

pandas>=1.0.5

numpy>=1.18.5

Paths

AnnotationPath: path to folder containing folders for each year. these folders containin the annotations as csv files.

SavePath: Path to new folder where 2 csv's for each group will be saved.

DataTypeFilename: Filename of csv where data/annotation types are assigned.

TrainPath: Path to ISIC-2017 part 3 training set ground truth

ValidationPath: Path to ISIC-2017 part 3 validation set ground truth

TestPath: Path to ISIC-2017 part 3 test set ground truth

Variables

NumGroups: Number of groups that take the course.

NumImages: Number of images that need to be assigned to a group.

AnnotationBalance: Distribution Malignant/Benign for images.

RequiredAnnotations: List of which annotation types an image has to be annotated before the program considers the image annotated.

RequiredAmounts: List of same length as RequiredAnnotations containing the amount of annotations for each catergory in RequiredAnnotations before the program considers the image annotated.

Annotation types

The following data/annotation types can be chosen in RequiredAnnotations. Only Asymmetry, Border and Color contain a significant amount of annotations (around 10000 each). The output of the function annotation_stats is shown first:

data_type ann_count mean_ann_img min_ann_img
Asymmetry 10471 6.4 3
Black 1099 5.5 2
Blood 250 2.5 2
Blue 1197 4.0 3
Border 10762 6.6 3
Brown 800 8.0 8
Color 10175 6.2 3
Color_Categorised 1599 3.4 3
Color_Fade 1000 5.0 3
Compactness 1764 5.3 3
Dermo 1485 3.8 3
Diameter 295 3.0 1
Erythema 664 5.0 4
Flaking 300 3.0 3
Red 1100 5.5 3
Rough_Surface 300 3.0 3
Skin_Color 299 3.0 2
White 300 3.0 3
Yellow 800 8.0 8

Asymmetry: How asymmetric a lesion is scored on a scale or boolean

Border: How irregular the border of a lesion is scored on a scale or boolean

Color: How uneven the color of a lesion is scored on a scale or boolean

Color_Categorised: Colors that are found in the lesion are noted with comma's between them. (different groups use different methods)

Dermo: Dermascopic/Differential structures. for instance pigment dots or globules.

Blood: Blood vessels can be found in the lesion

Blue: The lesions have a blue glow.

Color_Fade: Lesions have a faded border.

Compactness: uses formula c = circumference^2/(4π*Area)

Erythema: redness surrounding the lesion.

Red: Red is present in the lesion.

Black: Black is present in the lesion.

Skin_Color: Skin_Color is present in the lesion.

Flaking: Lesions shows flaking skin

Rough_Surface: surface of lesion is rough

White: White is present in the lesion.

Diameter: score based on the diameter of the lesion

Yellow: Yellow is present in the lesion.

Brown: Brown is present in the lesion.

Functions

create_annotation_df: loops through all folders from each year and applies select_data to generate one large dataframe containing features. Has columns ['ID', 'group_number', 'year', 'annotator', 'orig_column', 'data_type', 'data']. This dataframe can be used to select data for more specific applications.

annotation_stats: Uses annotations_df to print the ammount of annotations (ann_count), average amount of annotations per image(mean_ann_img) and minimum amount of annotations per image(min_ann_img) for each annotation type

get_annotations: returns pandas series of all annotations as lists using annotations_df and an ISIC_ID as a string.

drop_annotation_count_categories: Returns series of all images that are considered annotated using annotations_df, RequiredAnnotations, and RequiredAmounts

categorise_annotations: returns 4 dataframes contaning image ID and groundtruth: combinations of: unannotated, annotated, benign and malignant. This is done using a series of images id's that are considered annotated and paths to the groundtruths.

create_group_sets: Returns a dict containing dataframes for each group with assigned ISIC ID's and groundtruths. This is based on the 4 dataframes explained in the previous function and NumGroups NumImages AnnotationBalance.

save_group_sets: saves image id's (with and without ground truth) as csv to the folder SavePath, using the dict containing dataframes with image id's and groundtruth.