Are we making any figures/flowcharts?

bdo311 commented 7 years ago

Gonna tag @cgreene @agitter for input -- if not, the review will be a wall of text.

agitter commented 7 years ago

It would be a good idea. I personally don't have the time or design skills to do it myself. I agree the wall of text is not reader-friendly.

One option would be to reuse or adapt appropriately licensed figures we like from other papers. This came up a while ago in #135 for the intro specifically, but we could use figures in all sections.

cgreene commented 7 years ago

I am in the same boat. I'd like to see them but I also don't have time + design skills. If someone wants to volunteer for key figures that'd be great to have. If they come from the small number of papers from our group that made it into the review, we could generate some that highlight different results.

davharris commented 7 years ago

What should the figures depict?

agitter commented 7 years ago

We have many options:

High level graphics for the intro depicting representative problems in Categorize, Study, Treat
Neural network architectures and how they are well-suited for particular data types
Illustrations of specific tasks or problems we find especially interesting

Just about anything would be beneficial in my opinion.

burkesquires commented 7 years ago

Hello all, I just came across this effort and it looks fantastic. I am working on adapting deep learning techniques to elucidate influenza virus evolution and I have found a number of graphics that I think do a great job of communicating the complexities of neural networks and deep learning.

Neural Networks

For a great graphical overview of neural networks, I suggest the Neural Network Zoo from the Asimov Institute
Christopher Olah's blog posts in general as well as his post of Neural Networks, Types, and Functional Programming
Another of Christipher Olah's blog posts on LSTMs

Finally, I think figures go a long way to helping to communicate the concepts and I would be wiling to help in collecting and/or creating figures if that was through to be helpful.

agitter commented 7 years ago

Thanks @burkesquires! We're about a day away from our initial "release" (submitting to the journal and posting to bioRxiv) so we are only making emergency corrections at this point. However, we plan to make more revisions while the paper is under review. I will create a new issue with post-submission tasks in the coming days. We'd love to have help with figures.

burkesquires commented 7 years ago

Thanks sounds great Anthony!

burkesquires commented 7 years ago

I would also offer this table that does nice job of summarizing How to Choose a Neural Network from deeplearning4j. This table or a similar table would give readers some real world advice on which technique to try.

slochower commented 7 years ago

Hi! This is going to come out of the blue, but I came across this project while looking for a way to use CI to build and compile Markdown files for my own project. This repository was immensely useful. I see that you don't have any figures in your current manuscript -- but that's up for debate. It took me a few hours to figure out how to include images, and I thought I'd drop the instructions here in case it saves you any time. I know I could create PR, but that seems a bit presumptuous right now.

Update build/environment.yaml with
```
- pandoc-fignos==0.20
```

Update build/build.sh with --filter pandoc-fignos before --bibliography. (Here, I also added --katex for math.)

pandoc --verbose \
--from=markdown --to=html5 \
--filter pandoc-fignos \
--bibliography=$BIBLIOGRAPHY_PATH \
--metadata link-citations=true \
--css=github-pandoc.css \
--katex \
--output=output/index.html \
$INPUT_PATH

Update build/citations.py with a new regex pattern that doesn't catch figures as misformatted citations.

# pattern = re.compile(r'(\[[^@][^\]]*?\])[^(]', flags=re.DOTALL)
# This should ignore brackets without @ that don't have an !
# before them. So it will still match [cite] but it won't match ![caption](image.png){#fig:1}.
pattern = re.compile(r'(?<!\!)(\[(?!\@).+\])[^(]', flags=re.DOTALL)

Then, in the Markdown text, figures can be included as...
```
![The two state model from Brown and Sivak.](./images/brown-2017-figure2.png){ #fig:bs }
```
...and referenced as (see Figure @fig:bs). Note that the space in { #fig:bs } before the pound sign is important, otherwise jinja will crash. There is probably another way around this, but adding a space is the easiest solution.

For the images, I created the directory sections/images and I copy them to output/ in build.sh.

# Make output directory
mkdir -p output
# Move images to output directory
echo "Copying images"
cp -R sections/images output/

The HTML compilation with pandoc goes without a hitch because the HTML just looks for the images/ subdirectory. However, wkhtmltopdf needs to actually incorporate the images into the PDF, so you must change paths before building the PDF, and modify the paths for pandoc.

# pandoc is calling wkhtmltopdf inside build/ and it will hang if it
# can't find the image specified in the raw HTML. Therefore, changing to
# output/ first fixes this.

cd output

pandoc --verbose \
  --from=markdown --to=html5 \
  --filter pandoc-fignos \
  --bibliography=../$BIBLIOGRAPHY_PATH \
  --csl=../$CSL_PATH \
  --metadata link-citations=true \
  --css=../output/github-pandoc.css \
  --output=../output/nonequilibrium-literature-review.pdf \
  ../$INPUT_PATH

Anyway, if you do end up adding figures for a future manuscript version, maybe this will be useful 😄 .

agitter commented 7 years ago

@slochower Thank you very much for sharing this. I'll tag @dhimmel for more specific technical comments on the build changes.

If we do add figures in the next draft, and I certainly hope that we do, we may ask you to add these modifications in a pull request so that you get the credit.

dhimmel commented 7 years ago

I came across this project while looking for a way to use CI to build and compile Markdown files for my own project. This repository was immensely useful.

@slochower glad to hear. I'm also thinking of creating a standalone package with this functionality. I'll ping you if I do.

I agree with @agitter that a PR is the way to go. We'll let you know when we're ready for it.

slochower commented 7 years ago

@agitter @dhimmel Both sound good to me.

laserson commented 7 years ago

I would definitely say tables are the most important thing in this kind of review. Help people choose the best tools for their project by laying out all the relevant features you can think of.

burkesquires commented 7 years ago

@laserson I completely agree. I think the number of options of DL methods can be overwhelming and some guidance would be very helpful.

Some suggestions or inspirations:

"How to Choose a Neural Network", a table from deeplearning4j
The paper could also reference this KD table that helps users know which framework to pick for a particular DL method. The article also references a wikipedia page with more info.

Not DL related but helpful for inspiration:

Andrew Abela’s “Chart Suggestions” diagram
Zap Business Intelligences' analytics cheatsheet

burkesquires commented 7 years ago

Additional diagrams of neural networks can also be found at Jihad Alammar github page describing A Visual and Interactive Guide to the Basics of Neural Networks.

agitter commented 7 years ago

@laserson Are you suggesting a table mapping input data and domain to neural network architectures, like the deeplearning4j table @burkesquires shared above, or focused on deep learning frameworks, like the KD table above? I've been resistant to say anything specific about frameworks because the landscape evolves so quickly, making the wiki a good reference. For example, the KD table is only a few months old, but the addition of MXNet and CNTK as beta Keras backends is a notable change.

The former style table could be interesting. We cover so many data types and problems in the review that we would need to focus on a subset.

laserson commented 7 years ago

I am advocating putting in a table like that, along with all the features we think are important for scientists. One table can cover the lower level general frameworks. Another table can cover higher level tools that may be domain-specific, maybe broken out by domain. Another table can be a list of applications and which tools are available for it. May make sense to have it broken out for the different sections.

laserson commented 7 years ago

We could imagine maintaining a registry of domain-specific tools in the style of "awesome-*" that continues to get updated as well (not like people want more work though)

yfpeng commented 7 years ago

Have we agreed to put images in "images" or "sections/images" and cite them?

SiminaB commented 7 years ago

I am also just coming to this super-late, but I think even having a figure of a basic neural network diagram with two panels representing supervised and unsupervised applications would be helpful - maybe something like this https://www.researchgate.net/figure/278047488_fig1_Fig-3-Training-of-deep-learning-DL-network-includes-an-unsupervised-and-a-supervised but emphasizing one of the examples discussed in the paper for both the supervised and unsupervised scenarios. For a biostat person like myself, I think this would really help in understanding what they really are.

In terms of tables, I love both comparison and glossary tables, like the ones that show up in Nature Reviews Genetics papers such as http://www.umiacs.umd.edu/~hcorrada/CMSC858B/readings/wang_rnaseq.pdf For example, CNNs and RNNs are both mentioned extensively in the paper, so a glossary table could include their definitions and a comparison table some of their specific strengths and weaknesses and areas of application (these are already in the paper eg "Two dimensional CNNs are ideal for segmentation, feature extraction, and classification in fluorescence microscopy images [16].")

I would be happy to help with either of those!

evancofer commented 7 years ago

Figure Concept

While this paper certainly is not a "beginner's guide" to neural networks, I am wondering if it would useful for us to include a figure with a flowchart-like overview of the various "phases" (e.g., training) of the model construction process. This is certainly an aspect of neural networks that differs substantially from the more traditional supervised learning models. The many interchangeable approaches to each step in this process, along with the iterative reapplication of certain steps, lead me to believe that a graph (i.e., network) view would do an excellent job at succinctly showing possible workflows. The model construction workflows from the cited papers might be good starting points for brainstorming, and this figure could even be a re-conceptualized as a "visual summary" of the methods from those papers. I was initially thinking of this as a supplementary figure, but it could also work as an in-text figure. I realize that this figure may be overly ambitious for this stage of writing.

Brainstorming

While brainstorming different states and transitions for the figure, many terms came to mind. I have put them into a list below. I realize that this is an incomplete list with a bi-modal distribution of specificity; some terms should be added to this list, and many of this list's current members should be left out. However, I hope that the list clarifies what I mean by "interchangeable phases", as well as the figure's subject. Clearly, a proper solution must strike a balance between detail and readability. If we decide that such a figure is worth including, feedback will be essential for determining which steps of the process are worth incorporating. If we did include a significant amount of detail in the figure, it would probably be necessary to shorten the figure caption to a tractable size by identifying terms with a glossary table (as mentioned by @SiminaB).

model selection steps (e.g., random search; intelligent, as in "Blockout" from arxiv:1512.05246; grid search)
per-epoch/sample/batch steps (e.g., annealing learning rates and other parameters; early stopping; diagnosing loss curves)
Backpropagation and significant variations (e.g., backpropagation through time; synthetic gradients, as in arxig:1608.05343)
data partitioning. It is standard to use one holdout set for validating epoch performance and a second holdout set for testing finalized models, but this is not always feasible or optimal with smaller datasets.
standard epoch-wise training, and interesting variations. Examples of the latter might include:
- bagging, or a variant of bagging adapted for deep learning (as in doi: 10.1038/nmeth.3547)
- curriculum learning (discussed here)
post-training optimizations (e.g., model ensembling; model compression, as in arxiv:1503.02531)
data modification steps, such as:
- data pre-processing (e.g., zero-mean, unit-variance)
- data transformations (e.g., crops)
- simulated data
uses for the final product (e.g., classification, dimensionality reduction, data simulation)

Collaborative Implementation

Lastly, I think it would be useful to produce this figure (and others) in a collaborative manner, but cannot think of one that integrates nicely with GitHub. Perhaps a Google Drawing? We could export it to an SVG file, and then use pull requests to track versions of it.

SiminaB commented 7 years ago

Alright, so here is what I was thinking about in terms of a glossary table: https://github.com/SiminaB/deep-review/blob/master/tables.md. Super basic, I know, but I think it could be helpful (it would probably need to be expanded a fair bit). It may also work as a flowchart or the like, but as a reader of reviews, I often like having these references within a reference.

agitter commented 7 years ago

@yfpeng we haven't finalized how we'll add figures and tables yet. @dhimmel has a separate repository to generalize the Markdown-based manuscript system. Decisions about figure handling should be resolved there soon (see https://github.com/greenelab/manubot-rootstock/pull/8) and then ported here.

@SiminaB I find those Nature Reviews-style definition tables to be helpful as well. I support adding the table, and it will make our manuscript more accessible to a biomedical audience. We can continue discussing the specific entries in the table in #566.

@SiminaB we previously discussed adding a high-level figure similar to the one you mentioned or a figure showing different network architectures, which emphasizes that it would be broadly useful. I don't have specific suggestions for how to add this. We could either search for one that is suitably licensed for reuse here or create only collaboratively as @evancofer suggested.

@evancofer I have conflicting thoughts on the figure idea. Showing the interchangeable phases could help general readers understand how neural networks are used and demystify the process, which is great. Many of the specific bullet points do seem better suited for a fairly technical audience that wants to learn how to use and train neural networks themselves. A clear illustration of that is very valuable, but I'm not sure whether it is fully in scope here.

What do others think about the @evancofer's figure suggestion?

evancofer commented 7 years ago

@agitter I actually agree with this, but I am struggling to think of any figures that are as broad in scope as the text itself. As such, perhaps the review would be better served by a number of smaller figures (maybe 1 for each section at most). I agree that the network architecture diagrams are probably an excellent start.

agitter commented 6 years ago

Closed by #775

greenelab / deep-review