Paper reading May 2019 [3]

Human centered tool for coping with imperfect algorithms during medical decision making

Summary: assist pathological diagnosis by enhancing the conventional content-based image retrieval (CBIR) systems via interactive refinement techniques
Domain:
- data is highly regulated, expert time is rare, not very practical to get more expert-labeled data to improve model
- ML model help to augment, not replace expert intelligence in critical decision making
Refinement searches:
- refine by region: select a region of image and do a new search; has 1/4 and 1/8 crops in data
- refine by example: select x from images returned, take average of embedding, do a new search
- refine by concept: train CAVs on a set of concepts, then push the embedding along the CAV direction
Evaluation:
- Tool evaluation: whether the refinement updated results in the ways intended
  - methodology: apply each refinement to modify precense of concepts in images, then ask pathologists to rate the presence of that concept on a scale of 7
  - for refine by region, pick images where the concept region is small, then use the refinement to extract similar images
  - for refine by example: 1 pathologist refine, another pathologist rate
  - for refine by concept, shift each query image in the direction of CAV and extract nearest neighbours
- User study: this section includes some common HCI measures
  - utility for decision making
  - workload
  - attitude towards the system
HCI observations:
- address "semantic gap": content extracted from an image does not correspond to user's semantic interpretation of the image
- increase trust in blackbox ML models
  - eg. users desire transparency in what part of image will be seen by the model (eg. after crop)
- danger of over influencing the model (confirmation bias, user only search to confirm their existing belief)
  - mitigation: highlight refinement paths not taken
Train CAV:
1. pathologists label concepts in 100 images (good cosine similarity achieved after 20 images)
2. for each concept, use linear classifier to learn a hyperplane separating embeddings of images with and without the concept
3. CAV is the vector orthogonal to the learned hyperplane
4. also trained relative CAVs using opposite concept as negative examples (otherwise negative direction does not necessarily mean opposite concept)
Useful references:
- [43] semantic gap
- [33] embedding
- [26] CAV: concept activation vectors: directions in embedding space can encode human-interpretable concepts
- [21] DNN architecture used

Seq2Seq

Input => deep LSTM encoder => fixed-size representation (hidden state of last LSTM layer) => deep LSTM decoder (initialise with that representation) => beam search on softmax over vocabulary
This means we can translate sentence where input and output has different length.
LSTM because it's good at long term dependency
Reverse input improves model without affecting translation quality of long sentences; hypothesis is this reduce the "minimal time lag", eg. the beginning of input and beginning of output are very close
Use ensemble that differ in initialisation and batch order
References
- [10] Generating sequences with RNN
- [25] On the difficulty of training RNNs
Next
- details of beam search
- PCA projection of hidden state as interpretability

ResNet

To train super deep nets!
Previously, deeper nets result in higher training error (so not overfitting), which does not make sense because deeper nets can achieve strictly equal or better results by construction (by stacking up identity layers)
Hence probably an optimization problem.
Let the layer learn the "residual function":
- Desired = H(x)
- Layer learn: F(x) = H(x) - x
- Hence H(x) = F(x) + x
- Create a shortcut connection
- Input and output have same dimensions, or pad by 0
- This does not increase parameters or computation.
Bottleneck layers
- Input => reduce dimension (#filters) by 1x1 convolutions => increase dimension by 1x1 convolutions again; saves computation
- For example: a 3x3, 256 convolution includes 3 3 h w 256 computations; Instead we can do: 1x1, 64 => 3x3, 64 => 1x1, 256

xysun / blog

Paper reading May 2019 [3] #13