Summary: assist pathological diagnosis by enhancing the conventional content-based image retrieval (CBIR) systems via interactive refinement techniques
Domain:
data is highly regulated, expert time is rare, not very practical to get more expert-labeled data to improve model
ML model help to augment, not replace expert intelligence in critical decision making
Refinement searches:
refine by region: select a region of image and do a new search; has 1/4 and 1/8 crops in data
refine by example: select x from images returned, take average of embedding, do a new search
refine by concept: train CAVs on a set of concepts, then push the embedding along the CAV direction
Evaluation:
Tool evaluation: whether the refinement updated results in the ways intended
methodology: apply each refinement to modify precense of concepts in images, then ask pathologists to rate the presence of that concept on a scale of 7
for refine by region, pick images where the concept region is small, then use the refinement to extract similar images
for refine by example: 1 pathologist refine, another pathologist rate
for refine by concept, shift each query image in the direction of CAV and extract nearest neighbours
User study: this section includes some common HCI measures
utility for decision making
workload
attitude towards the system
HCI observations:
address "semantic gap": content extracted from an image does not correspond to user's semantic interpretation of the image
increase trust in blackbox ML models
eg. users desire transparency in what part of image will be seen by the model (eg. after crop)
danger of over influencing the model (confirmation bias, user only search to confirm their existing belief)
mitigation: highlight refinement paths not taken
Train CAV:
pathologists label concepts in 100 images (good cosine similarity achieved after 20 images)
for each concept, use linear classifier to learn a hyperplane separating embeddings of images with and without the concept
CAV is the vector orthogonal to the learned hyperplane
also trained relative CAVs using opposite concept as negative examples (otherwise negative direction does not necessarily mean opposite concept)
Useful references:
[43] semantic gap
[33] embedding
[26] CAV: concept activation vectors: directions in embedding space can encode human-interpretable concepts
Input => deep LSTM encoder => fixed-size representation (hidden state of last LSTM layer) => deep LSTM decoder (initialise with that representation) => beam search on softmax over vocabulary
This means we can translate sentence where input and output has different length.
LSTM because it's good at long term dependency
Reverse input improves model without affecting translation quality of long sentences; hypothesis is this reduce the "minimal time lag", eg. the beginning of input and beginning of output are very close
Use ensemble that differ in initialisation and batch order
References
[10] Generating sequences with RNN
[25] On the difficulty of training RNNs
Next
details of beam search
PCA projection of hidden state as interpretability
Previously, deeper nets result in higher training error (so not overfitting), which does not make sense because deeper nets can achieve strictly equal or better results by construction (by stacking up identity layers)
Hence probably an optimization problem.
Let the layer learn the "residual function":
Desired = H(x)
Layer learn: F(x) = H(x) - x
Hence H(x) = F(x) + x
Create a shortcut connection
Input and output have same dimensions, or pad by 0
This does not increase parameters or computation.
Bottleneck layers
Input => reduce dimension (#filters) by 1x1 convolutions => increase dimension by 1x1 convolutions again; saves computation
For example: a 3x3, 256 convolution includes 3 3 h w 256 computations; Instead we can do: 1x1, 64 => 3x3, 64 => 1x1, 256
Human centered tool for coping with imperfect algorithms during medical decision making
Seq2Seq
ResNet
H(x)
F(x) = H(x) - x
H(x) = F(x) + x