Open traversc opened 7 years ago
Thanks, I edited the first post to add the DOI
@traversc Does this paper compare to any other baseline approaches? How do they assess the performance of the LSTM model?
The paper compared 5 models:
To evaluate their model performance, they use a 20% hold out test set (on a dataset with 6000 protein sequences and 11 possible protein compartments (E.R., golgi, etc.) and calculated accuracy. Their results are summarized in table 1:
Method | Accuracy |
---|---|
Sequence only |
|
R-LSTM | 0.879 |
A-LSTM | 0.854 |
R-LSTM ensemble | 0.902 |
MultiLoc | 0.767 |
Sequence + metadata |
|
MultiLoc + PhyloLoc | 0.842 |
MultiLoc + PhyloLoc + GOLoc | 0.871 |
MultiLoc2 | 0.887 |
SherLoc2 | 0.930 |
_Edit: DOI link https://doi.org/10.1007/978-3-319-21233-3_6_
Hi Dr. Greene et al.,
Here is another paper that I found to have a quite interesting premise: https://arxiv.org/pdf/1503.01919.pdf (apologies if this paper was already mentioned)
The idea is that LSTMs would be able to connect distant parts of a protein (or DNA) sequence because unlike a RNN, LSTMs have long term as well as short term memory "channels". Now that I learn more about the LSTM architecture, I think this makes a lot of sense, since protein folding or 3D structure may bring distant parts of a gene/protein sequence together. LSTMs may be able to connect these distant parts in a way that CNN alone would not be able to.
Although CNNs and DAs were mentioned in the "Overall manuscript structure", I think it's important to write on RNN/LSTMs since they also seem to be a good "fit" for sequencing data.
On an unrelated note, I saw this interesting blog/website that succinctly summarizes many forms of architecture in the "Neural Network Zoo" (it also summarizes some more classical ML algorithms). http://www.asimovinstitute.org/neural-network-zoo/
It reminds me of the phrase "endless forms most beautiful" from the Sean B. Carroll's book on evo/devo.