greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 271 forks source link

Identify missing or incomplete sections and themes #835

Closed evancofer closed 6 years ago

evancofer commented 6 years ago

There are certainly topics that were not covered in the initial review. Although many were worth covering, there were cases where we didn't have the time or the personnel to review these topics before the initial deadline. Before the 2019 release, we hope to fill in these knowledge gaps.

Note that this is part of the preparation for a re-release of the paper in 2019. See #810 for the genesis of this.

stephenra commented 6 years ago

Thanks @evancofer. Not necessarily a missing section or theme but @agitter had h/led model diagrams as a prospective enhancement in #703. Perhaps we could also include select diagrams for each section or theme?

Are there any immediate or glaring gaps that you had in mind? I liked the discussion raised by @michaelmhoffman in #614 as an additional topic.

evancofer commented 6 years ago

I think figures could be very useful in some parts of the review, particularly for clarifying some of the more complex or key methods. Some tables might also be handy (particularly a table of contents). When this came up before I believe the consensus was that the efforts of our limited maintenance team had to be spent curating the existing content for submission before spending a significant amount of time on figures (@cgreene or @agitter correct me if I am wrong here). I am also a bit hesitant to create figures for every section; this could get out of hand quickly. With that in mind, are there specific instances where figures would seem particularly useful?

In general, I think it is best if we created only a limited number of very specific and detailed open requests for figures (a "wish list" of sorts). My thinking here is that if collaborations are more limited than we hope, we can still pivot without having lost too much time, and hopefully bring the project to completion with a skeleton crew.

Lastly, before we make too many figure requests, we may want to create a style guide (e.g. a unified colorblind-friendly palette, dpi recommendations, etc). I think that a formal guide for styling figures will make it easier for a collaborator to commit their time without fear that we won't like or won't use the figure they spent so much time on.

stephenra commented 6 years ago

What I largely had in mind was also around clarifying methods although I completely understand and echo trepidations about scope creep, cognizant of limited bandwidth. One clarifying example -- I recall some figures by Sebastian Ruder around hard- and soft-parameter sharing in MTL from Rich Caruana's paper and even though parameter sharing was not explicitly discussed in Deep Review, a cartoon of MTL might be useful in distinguishing itself from transfer or multi-modal learning.

I can open a new issue re: style guide. I think it's a good idea.

evancofer commented 6 years ago

I think a table of contents would be extremely useful, and would probably require minimal effort to execute. However, I am unsure if manubot supports this sort of thing yet. That being said, I think we should still focus on specific instances of textual content that need modification or are missing entirely.

I think the scaling section could also be expanded as new libraries for deep learning in biology come online (e.g. as in #837).

agitter commented 6 years ago

You are correct that we did not devote much energy to figures. The v0.9 release did not have any. One option that we did not pursue is reusing or adapting existing figures that are licensed appropriately. That could be a way to clarify complex models without spending a lot of time designing graphics.

@cgreene did adopt the model overview in the introduction from an existing figure, but he reproduced it from scratch. I'm thinking that we could directly place and attribute existing CC-BY figures.

stephenra commented 6 years ago

@evancofer @agitter Thanks for the feedback. I like the CC BY figures idea. If a bespoke figure is really needed or requested, we can always address on a case-by-case basis.

stephenra commented 6 years ago

@evancofer W.r.t. missing textual content, there are some interesting papers in behavioral neuroscience utilizing CNNs or deep generative models to classify behavior in animal studies. Would applications to animal models be something worth including?

evancofer commented 6 years ago

Over the past day I've been going through the current manuscript and making notes. I will type them up tomorrow and post them. Unfortunately, this has all coincided with my getting a new laptop so it may be a little slower than usual.

In general, I have several ideas about what is missing and possibly some improvements to the logical flow of the paper. Like I said, I will type these up tomorrow. Most of these improvements are in fields that I am more familiar with (e.g. metagenomics, sequencing data).

My second takeaway is that some subsections are more descriptive than analytical. I think this is due to the newness of many of the subfields and methods discussed. I suspect that over the course of a year, the maturation of these subfields will make it easier to identify the grand challenges and opportunities in each subfield. This should make it easier to speculate about the future.

tuncadogan commented 6 years ago

I would like to propose a new sub-section regarding "deep learning based automated protein function (i.e., Gene Ontology) prediction approaches", that could go under the section "Deep learning to study the fundamental biological processes underlying human disease". Some of the papers related to this topic has already been mentioned in different issues.

I'm a new/candidate contributor for the Deep Review, especially for this topic, I've read guide documents but was not sure where to start, this issue seemed like a good place, apologies if it is not.

evancofer commented 6 years ago

@tuncadogan Does this overlap at all with the protein-protein interaction section? If so, then I think it would be best to build this into the PPI section. If it ends up outgrowing the PPI section, then perhaps we can split it off into its own subsection later. The best way to start contributing would be to fork the repository, draft your changes, and submit a pull request for review. The specific process for this is discussed in the CONTRIBUTING.md file. Additionally, README.md should have information on formatting citations and so on.

evancofer commented 6 years ago

@stephenra It depends on the animal study really. It would make sense to add onto the neuroscience section, since it is pretty short right now. I believe that this was actually brought up by the second reviewer (See #678).

tuncadogan commented 6 years ago

@evancofer Thank you very much for the information. Considering the overlap, actually protein function prediction and PPI prediction are related but two different topics. The term ''protein function prediction'' usually refers to associating protein records with function defining ontological terms, mostly Gene Ontology (GO) terms. Protein function prediction field has a large research community with quite a rich literature, including dedicated tracks in major conferences such as ISMB and a community based challenge called Critical Assessment of protein Function Annotation (CAFA). Now the deep learning based prediction methods start to accumulate in the literature as well.

I'll do as you described (forking, drafting changes and pull request). I believe there is no incoming deadlines at the moment (I'll plan the timing of the work according to this).

evancofer commented 6 years ago

@tuncadogan With that in mind, we may want to eventually shift towards a more broad section like "proteomics". For now, I would guess draft it as an individual subsection where you said, and we can figure out the higher level organizational issues when they come up. I think there may be some Issues in the repository that have related papers, but I have yet to go through them all yet....

tuncadogan commented 6 years ago

@evancofer great! I'll do it like that. Thank you.

evancofer commented 6 years ago

Having read through everything a few times in the past day, I think that we aren't missing any major sections. Instead, we may want to just continue to update the review as new references become available. Then, if it becomes necessary, we can reorganize or restructure different subsections. @stephenra what do you think?

stephenra commented 6 years ago

@evancofer Thanks for the thread. To your pt. on the animal study, what I specifically had in mind was this paper. Thinking about this again though, this is primarily a methods paper (PGMs + DNNs) with the automatic classification of animal behavior as the 'real-world' application (they do an initial experiment using synthetic data). I may have missed this from earlier but do you think this would still qualify for inclusion or does it broaden the scope too much?

Apart from that, I'll take a last look tonight to see if there's anything glaring. Otherwise, I think we can proceed in updating the review with the incoming set of references.

[edit] Had a chance to look through again and agree that there isn't anything major missing.

evancofer commented 6 years ago

@stephenra Wrt the paper, it could be a good addition, so long as we don't just mention it and and not integrate it into a larger discussion. That is, the focus should probably still be on curation rather than addition. I would add it in and submit a PR, but would consider making it part of a larger effort to revise and improve the existing neuroscience subsection and the paper in general.

As a quick aside, I could even see this as being part of a larger discussion of deep learning and lab automation, since it helps aggregate phenotypic data in a computational and automatable manner. These sorts of automation applications could be particularly transformative, but perhaps their discussion should be alongside their respective biological research domains (i.e. neuroscience in this case)

I'm going to go ahead and close this issue, since it seems like we are all in agreement to focus on curation rather than addition.