Consider change to reference 346

OmnesRes commented 1 year ago

While the authors claim to be doing a convolution, even going as far as naming their tool CellCnn, they are doing a stride of 1 which is essentially a dense layer. Further, you say they used the "most discriminative filters" but the tensor sort they do I believe will result in sorting by the first filter, then second, etc. So essentially the sorting is almost entirely driven by the first filter.

agitter commented 1 year ago

The reference numbers can change with each build of the manuscript, so I'm noting here this comment refers to:

Sensitive detection of rare disease-associated cell subsets via representation learning https://doi.org/10.1101/046508 https://greenelab.github.io/deep-review/v/44ff95a2756bd03a32c046eb8983ac9c4f223b0a/#ref-r3Gbjksq

We'd welcome a small pull request to clarify this. The authors do claim they have a convolutional neural network, so I'd have to reread their methods to see what they're doing.

OmnesRes commented 1 year ago

Yes, I realized that the reference number might change after I posted my comment. To be specific I looked at the code here since the link in the paper is broken: https://github.com/eiriniar/CellCnn/blob/master/cellCnn/model.py

A 1X1 convolution with channels last is equivalent to a dense layer, I guess I'll leave it up to you to decide whether or not to call that convolutions or not.

It seems I was mistaken about the sorting and each filter will be sorted independently.

agitter commented 1 year ago

It sounds like they have a special case of convolution similar to this part of the Transformer architecture in the Attention Is All You Need paper:

Is that right?

I want to differentiate between this being a special case of convolution that should be clarified in the text versus this not being a convolution in any sense before making an edit.

OmnesRes commented 1 year ago

A 1X1 convolution is a common step in convolutional networks, but it is simply a dimensionality reduction technique and mathematically equivalent to a dense layer. While it's used in convolutional models, it alone doesn't make the model convolutional. It is used instead of a dense because a dense requires the input dimension to be known while a 1X1 conv layer does not. This post goes into more detail: https://stackoverflow.com/questions/39366271/for-what-reason-convolution-1x1-is-used-in-deep-neural-networks.

The authors of this paper likely used the 1x1 conv so that they can have a variable number of cells as input. A dense layer would have required knowing the exact number of cells. So they could maybe train a model where each sample had 100 cells, and then apply it to problems with 1000 cells and the model would still run. However, regardless of how many cells their model is run on it is always only featurizing a single cell at a time, which isn't a convolution. While that is a clever way to perform multiple instance learning, the most natural way is with ragged tensors as we did (https://github.com/OmnesRes/ATGC/tree/method_paper). As an aside you may want to look at our MIL paper as I view it as a large step in bringing deep learning to genomics data: https://www.biorxiv.org/content/10.1101/2020.08.05.237206v4

alexbaras commented 1 year ago

I suspect this is the version of the code closest / just proximal to the publication. https://www.nature.com/articles/ncomms14825 https://github.com/eiriniar/CellCnn/tree/e0464f64ce60fc616c2bb7724134b306910bb43d

build_model function within that code base here clearly defines the model https://github.com/eiriniar/CellCnn/blob/e0464f64ce60fc616c2bb7724134b306910bb43d/cellCnn/model.py

for convenience I have refactored that build below to the core steps (removing regularization, filtering, etc) just to get at the cores steps below

with the key point being that the conv1d function https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1D is by default considering the input tensor [Batch, Cells, Features] as a channels last and as such is technically operating over the axis of the cells as the "1d" here.

The authors correctly understood that this axis is permutation invariant and as such did 1x / kernel_size=1 (i.e. the order of the cells means nothing). This is obvious when you consider both the source data along with the random sub-sampling strategy that is used to fix the number of cells in that axis in their implementation.

However, as my colleague above is suggesting - the use of 1x 1d conv in this manner is really not "convolution" in my humble opinion. A simple dense layer is accomplishing the identical task and is much more straightforward to understand, In fact, this notion is really much more in line with how this is presented in the manuscript above (Fig 1b) particularly where they show "filter 1" as a vector of weights being applied to cell feature vectors and resulting in a single value for that cell filter combination.

I would like to just point out that we believe that the results are all likely valid as this whole discussion is just around semantics. The two operations produce the same result. The bottom line is that to us the notion of convolution does not really apply as the cells are permutation invariant and the cell feature axis are all "independent" variables.

import tensorflow as tf

# input shape - using fixed input # of cells sampled randomly for full data for each "sample"
n_cells_per_sample = 1000
# feature dim of input data
n_features_per_cell = 10

# the input layer
input_tensor = tf.keras.layers.Input(shape=(n_cells_per_sample, n_features_per_cell)) # [None, 1000, 10]

# the "convolution"
nfilters = 16
conv1 = tf.keras.layers.Convolution1D(filters=nfilters, kernel_size=1, activation='relu')(input_tensor) # [None, 1000, 16]
### the above is better understood as simply a dense layer
### dense1 = tf.keras.layers.Dense(units=nfilters, activation='relu')(input_tensor) # [None, 1000, 16]

# sort on cells (column independent) and the take mean of top k
# this is the function they used
# def select_top(x, k):
#     return K.mean(T.sort(x, axis=1)[:, -k:, :], axis=1)
# pool1 = Lambda(select_top, output_shape=(nfilter,), arguments={'k':k})(conv1)

# doing in tf directly here no need for theano
# key point here is that at this point you are pooling over axis=1 (cells)
top_k = 25
pool1 = tf.reduce_mean(tf.gather(tf.sort(conv1, axis=1, direction='DESCENDING'), tf.range(top_k), axis=1), axis=1) # [None, 16]

# they had some dropouts, but I am cutting to output as the question here is more about the 1x 1dconvolution above
# output shape of classification task, binary below for example
output_shape = 2
output = tf.keras.layers.Dense(output_shape, activation=None)(pool1) # [None, 2]

# keras model
model = tf.keras.layers.Model(input=input_tensor, output=output)

agitter commented 1 year ago

this whole discussion is just around semantics

Yes, I agree. I seem to recall others using the term "convolution" to describe this same operation on gene expression vectors around this time (maybe a talk I saw on https://github.com/KnowEnG-Research/knocknet), but the main point is that currently saying "CNN" alone could be confusing or misleading.

Here's the current content of https://github.com/greenelab/deep-review/blob/master/content/04.study.md

To tackle this challenge, Arvaniti et al. [@tag:Arvaniti2016_rare_subsets] classified healthy and cancer cells expressing 25 markers by using the most discriminative filters from a CNN trained on the data as a linear classifier. They achieved impressive performance, even for cell types where the subset percentage ranged from 0.1 to 1%, significantly outperforming logistic regression and distance-based outlier detection methods. However, they did not benchmark against random forests, which tend to work better for imbalanced data, and their data was relatively low dimensional.

If either of you would like to make a small pull request - for example, replacing "CNN" with "model" or "NN" and adding a new sentence at the end explaining that the model uses convolutional layers but in an atypical way - I'll review it. You could add yourself to https://github.com/greenelab/deep-review/blob/master/content/08.methods.md#acknowledgements as well.

OmnesRes commented 1 year ago

To me when I hear CNN that implies that there is some sort of order to the data, for example pixels in an image, or nucleotides in DNA, etc. Looking at the name of their model, CellCNN, my first inclination would be that they somehow imparted order to the cells and convolved over multiple cells at a time. However they essentially have tabular data and applied MIL to the data. MIL is currently severely underutilized in biology so I think the authors deserve credit for being one of the first to apply it to this field (in fact your manuscript appears to have no mention of MIL at the moment), so I'll think about how to get that point across, but mentioning MIL may require a new term in your glossary?

agitter commented 1 year ago

I suggest separating the two possible revisions and pull requests. Clarifying the description of CellCNN would require only a new sentence or two, which we can review and merge quickly.

Writing about multi-instance learning is also a good idea, and you are correct that is it a surprising omission in the current manuscript. There have been applications in biomedical imaging too that could be discussed. However, that would be a bigger change and require more writing and editing.

OmnesRes commented 1 year ago

If you add a section on MIL you may want to include our recent publication: https://www.nature.com/articles/s41551-023-01120-3

greenelab / deep-review

Consider change to reference 346 #1032