Explore: Efficient Transformers

Description

Currently, we are struggling to train transformer models for inputs larger than 128^3 volume. Within the volumetric grid representation of shape, the resolution achievable from this may not be enough. For example, for large bone shapes such as hip and ribs, we have had to use voxel resolution of ~2.5mm which is very coarse. We might be able to output a more high resolution image if the transformers were lighter in terms of memory usage.

Proposal

1) Use a 2D encoder and a 3D decoder. How do we execute this using transformer? A case in point how do we port the ideas from TransVert, 3DReconNet into pure transformers. How do we concatenate features from the two parallel x-ray branches and decode a 3D output? 2) Implement/port Conv+transformer These might be more efficient in terms of memory usage. Try: 1) implement/port various lightweight transformers and other interesting ideas

3D Medical Axial transformer, CoTr and AFTer-UNet seem promising

naamiinepal / xrayto3D-benchmark

Explore: Efficient Transformers #20

Description

Proposal