greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.24k stars 272 forks source link

The future of deep review #810

Open agitter opened 6 years ago

agitter commented 6 years ago

We resubmitted version 0.10 to the Journal of the Royal Society Interface and bioRxiv. Thanks everyone for all the great work on the revisions!

I'd like input on where we want to go from here. Should we continue to accept edits to the manuscript even after a version is accepted at a journal? Should we accept only errata corrections but lock the main content?

I don't want to dissolve this amazing group of authors. However, there isn't much precedent for living manuscripts that continue to change after publication, and realistically we are all very busy with other projects. The activity dropped off considerably between the first submission and the reviewer feedback.

cgreene commented 6 years ago

I think that if we have a committed group of maintainers there is the opportunity to do something new here in the way of a living scientific manuscript that stays up to date with the field. However, we probably need more than me and @agitter to make it sustainable. Does anybody else have an interest in helping to contribute at the maintainer level?

cgreene commented 6 years ago

One quick thought. We should probably be talking more about what we do after v1.0, which I imagine would be the accepted version at the journal. At this point I feel like we should push to that finish line. 😄

agitter commented 6 years ago

We should probably be talking more about what we do after v1.0

Agreed. I think we should only take pull requests on obvious typos until v1.0.

agapow commented 6 years ago

Agreed - hit the (immediate) finish line and then worry about the future.

And when we get to that future, a few discussion points or ideas:

evancofer commented 6 years ago

I could be interested in this. It could help to define expectations for maintainer roles, but those obviously depend on a lot of other variables.

Similarly to @agapow , I felt that keeping up with the notifications was sometimes like drinking water from a fire hose. I think this was due to the intermixing of notifications related to content (i.e. tracking new references and writing) and infrastructure (e.g. administrative, repository code, formatting).

I also wonder if github is the best place for this sort of thing, or if such a platform exists.

Lastly, given the size of the paper, does shifting to a different format - one that is designed to organize information on a grander scale (e.g. a book) - make more sense long-term?

stephenra commented 6 years ago

@cgreene @agitter I realize I'm a bit late to the conversation but fwiw, would definitely be interested in helping in a maintainer function or role.

@evancofer To your pt. about inundation, was the Projects feature (or something similar with GH integration like Trello) used to track progress? I wonder if that might be one way of making the different workstreams a little more manageable and organizing issues based on the topic or sub-topic.

evancofer commented 6 years ago

@stephenra AFAIK there weren't any project management tools (e.g. Trello, Asana, GH Projects) used. I would guess that @cgreene lab had some internal tracking of general project status, but that probably isn't too useful for our purposes. Enabling contributors to easily subscribe to notifications for one or a few sections/topics could be useful.

agitter commented 6 years ago

@stephenra we used milestones within GitHub and labels (usually, not always) for some form of organization. We also ended up having a few master issues to track progress and link to related issues at various project stages (e.g. #188 and #678 ). @cgreene and I didn't really have any formal internal tracking beyond that, and I'd be open to better organization if other maintainers join in to keep this going.

stephenra commented 6 years ago

@evancofer @agitter Thanks for clarifying! I'm tool-agnostic but under the working assumption that this continues to grow in scope, it may be helpful to adopt one (my past experience with Trello and GH Projects have been overall positive but admittedly Kanban board-style project management tools aren't for everyone). Is there an estimation of roughly how many maintainers would be needed to keep the project going?

agitter commented 6 years ago

@stephenra I'd say the number of maintainers depends on what exactly we want to sustain. Is it an up-to-date manuscript or book? A curated list of papers, tools, and data? Something else?

stephenra commented 6 years ago

@agitter Makes sense. And apologies, I realized I'm getting ahead of the conversation given the immediate focus on v1.0.

agitter commented 6 years ago

@stephenra This is actually a good time to have the conversation while we still have contributors' attention after the recent round of revisions. Let us know if you have more thoughts about what form the future product or project should have.

cgreene commented 6 years ago

I agree that now is a good time to figure this out. In terms of tooling, our lab has used waffle.io for other projects and found it useful. I think the same things that it has helped us organize could aid the maintainers in planning what to include.

I also think we'd be breaking new ground on authorship, but I like the idea of a "release" occurring either every 6 or 12 months (from our own experience, I think 12 months is more reasonable). If there were project participants who would like to lead each of those releases, I think the authorship categories could accommodate a reshuffling of the author list on each release (we could stick "maintainers of previous versions" in a category that doesn't shuffle to the last positions - those could be "maintainers of the current version"). Maybe JRSI would like to publish an annual update for a few years, or maybe we could talk with other journals about future releases (imagine a Jan 2019 release date for the next one...).

If any journals are interested, feel free to chime in :)

Anyway, these are just some thoughts.

benstaf commented 6 years ago

If you want to move on to another collaborative paper in deep learning for medicine, try:

“DiversityNet: a collaborative benchmark for generative AI models in chemistry”

cgreene commented 6 years ago

@mostafachatillon : thanks for raising that. It might be more appropriate to raise this as a new issue since your point doesn't relate directly to the future of this paper.

Also, note that your blog post has an inaccuracy. You say:

But for writing the DiversityNet paper, GitHub is an inappropriate tool, because GitHub does not natively support math formulas. and you link https://github.com/github/markup/issues/897#issuecomment-288580903.

That is related to GitHub's native system for displaying markdown. Deep-review doesn't use that. It may also be the case that manubot, the build system for deep-review doesn't yet support formulas. However, if that's the case you should correct the inaccurate link in your blog post.

stephenra commented 6 years ago

@agitter @cgreene Apologies for the lapse in response.

I agree that now is a good time to figure this out. In terms of tooling, our lab has used waffle.io for other projects and found it useful. I think the same things that it has helped us organize could aid the maintainers in planning what to include.

Agreed on the tooling. I've heard good things about waffle.io and had some success with Asana and Trello, which both integrate with GitHub as well. I'm not particularly opinionated on this so I would imagine whichever platform most contributors feel comfortable with or offers the lowest bar to access is the best way to go. I'd be happy to set up a survey, if that helps.

Apart from GitHub issues, I've found it helpful and more easily manageable in tracking todos and PRs to batch issues by categories (rather than just tags). I'm not sure if the lab(s) adopted this approach but, for example, the different application domains/sub-domains in the paper could be a natural way to think of structuring these categories (e.g. gene expression vs. protein secondary and tertiary structure, etc.).

I also think we'd be breaking new ground on authorship, but I like the idea of a "release" occurring either every 6 or 12 months (from our own experience, I think 12 months is more reasonable).

I favor the idea of 12 month release as well. It gives time to account for difficulties in scheduling and coordination for contributors and, given the speed of the field, it also provides time to understand a broader range of contributions and distinguish what might be meaningful work vs. flag-planting.

evancofer commented 6 years ago

@cgreene @agitter @stephenra A yearly release sounds feasible and reasonable.

I have used Asana and Trello in the past, and I am comfortable with using both. Tentatively, I would lean towards Asana because it seemed to be (at least to me) more flexible and feature-rich than Trello. However, I am not particularly familiar with integrating either of them with GitHub. Is there a way to use any of these project management tools in an "open" manner that allows people to view the current project status without necessarily signing up for an Asana/Trello/Whatever account and so on? At least with respect to content reviews and discussion, it is probably important to maintain this project's transparency.

Obviously, the immediate goal is to finish the initial publication. The next step is to identify and enumerate the specific maintenance tasks, especially those that the current team needs the most help with. With regards to planning for long-term progress, it would also be useful to list any goals/problems that have come up but were too ambitious or not pressing enough for the initial publication.

Thoughts?

stephenra commented 6 years ago

@evancofer I believe you can make Asana projects 'public' but this only makes the project viewable to others who are part of your organization but not necessarily a team member (as opposed to anyone, in general).

On the other hand, Trello you can make publicly viewable to anyone and the project page will be indexed on Google. I do agree on the pt. about transparency -- to this end, I've worked on or seen some projects that use some combination of GitHub, Trello, and Gitter. The code/repo is on GitHub, the (public) project management is handled by Trello, and the community and chat is on Gitter. If that's too much added complexity, perhaps GitHub and Trello might be best.

evancofer commented 6 years ago

@stephenra Trello and GitHub seems like a good solution without too much added complexity. I'm thinking we could use Trello to track maintenance etc and keep discussion on GitHub (and use continue to use issue labels and other features to track and organize).

stephenra commented 6 years ago

@evancofer That sounds reasonable to me. :+1:

cgreene commented 6 years ago

If you have not played around with http://waffle.io I would encourage you to give it a shot. I made a deep-review waffle. It is an overlay on github issues, so it's convenient to work with in this framework: https://waffle.io/greenelab/deep-review

At this stage, I think we really need 2-3 committed maintainers to develop a new plan, update the README with the plan, and then start to take over the project with the goal of releasing a new release at some point in 2019.

cgreene commented 6 years ago

I went through all issues up to 100 and I closed them if we had referenced the paper or if it was a discussion that had concluded.

evancofer commented 6 years ago

@stephenra @cgreene The waffle.io view on the project should work fine.

Like Casey said, we should probably find some more committed maintainers interested in long-term work if this is going to be successful. Contributors were obviously a good place to start, but I am unsure where to search next?

I'll get working on an update to the README and submit a PR sometime this evening. This will probably include a status update and a new section about the future of the project.

cgreene commented 6 years ago

It might be nice to think about an authorship model where people "decay" towards the middle after a release. The current set of authors would be the "middlest set" of the next release (unless they contribute) and new authors would bookend them. I'd imagine maintainers at the end with the other author classes on the front.

If people understand how these items will be handled, it might help to draw in new contributors. I'm also happy to promote the work towards a 2019 release, and I'll even commit to a bit of writing (though at this time I'd prefer not to be a maintainer 😄 ). It sounds like @evancofer and @stephenra might be interested. Maybe you could snag a third so that votes are resolved via tiebreak, although @agitter and I did survive the pairing.

evancofer commented 6 years ago

It does seem prudent to get a third person. Most of the people that come to mind are in my current lab or department, so - out of fairness - I am somewhat hesitant to recommend any of them.

It may be best (in terms of ethics and effort) to, as you say, append them in a semi-randomized order. Perhaps we could do this at the end of every month (or some other period of time)? I imagine this could incentivize repeat contributions. Perhaps it would be useful to use a semi-random hierarchical grouping again? Was manually determining author hierarchies time consuming or maintanable?

agitter commented 6 years ago

I agree that it is important to think about authorship, how new contributors will be recognized and incentivized, and what will happen to the existing contributors in a v2.0 release. We can break precedent with the v1.0 author ordering algorithm if that makes it easier to continue deep review in the long term. I wouldn't expect to be kept in my current position if new maintainers take over, and I do see myself more as a standard contributor than a core maintainer for the next release.

However, if you don't find a third maintainer, I'd be willing to help with tie-breaking in special circumstances.

Was manually determining author hierarchies time consuming or maintanable?

We only did this twice, so it wasn't too onerous. We also kept the categories broad to help. It did require considerable manual effort because we reviewed commits as well as contributors' discussion in issues and pull requests. I was initially working toward fully automating the author randomization but stopped once Manubot because a separate project. The deep review author ordering was too specific to this collaborative process.

A fully automated ordering for Manubot should probably take some unordered author yaml with whatever extra metadata is needed for ordering, sort the authors, and pass the sorted list to Manubot as metadata.yaml.

evancofer commented 6 years ago

I have made a new issue (#833) dedicated to the discussion of author ordering.

stephenra commented 6 years ago

@cgreene Thanks for the waffle.io for the review. LGTM, as well. And yes, count me interested!

@evancofer @agitter I could reach out to some folks who have read the review but have yet to have any direct involvement to see if they're interested (looping in @austinvhuang in case he has might have some suggestions for folks, too). Are there any hard 'requirements' as to who maintainers ought to be in terms of background, existing contributions to the project?

cgreene commented 6 years ago

Ok - I did a bit more triage. If it's in the "Inbox" column, it hasn't been checked. If it's in "Backlog" then it exists and has not been addressed (if it's a paper, it hasn't been cited). @evancofer and @stephenra : I think that it makes sense for you two to feel free to close these issues at will if you don't immediately see a reason to cite it.

It may also be good to identify sections/themes that are not well covered (but that you wish were covered).

Finally, I've taken some issues and assigned them to myself to see if we can get those cleaned up.

evancofer commented 6 years ago

@cgreen noted. It seems I cannot assign myself to issues (i.e. #598, #605). Anyhow, I will go through and find sections that were skimped on. I already know a few that I can add to significantly. I've opened #835 to cover this.

cgreene commented 6 years ago

Gave @evancofer and @stephenra write access on the repo to denote their new status as maintainers. 👍

You now have permissions to assign issues / etc. And to merge PRs!

stephenra commented 6 years ago

Thanks @cgreene!

evancofer commented 6 years ago

@cgreene Thanks Casey.

I went ahead and assigned those issues to myself. I am aiming to finish some of them by the end of the week. Should probably go through and discuss some of the other issues and try to tackle them as well.

evancofer commented 6 years ago

After our discussion and my own internal debate, I think that we should focus on revising and updating the existing sections rather than adding new ones. There are some occasions where we cite a method's name and performance, but say little else. This could contribute unnecessarily to the manuscript length without really contributing the the manuscript's insights. For many of the subsections, I also think it would be good to reassess them and relate them back to our guiding themes (i.e. has deep learning had an impact, and what would need to be true for it to make an impact). Thoughts?

agitter commented 6 years ago

For many of the subsections, I also think it would be good to reassess them and relate them back to our guiding themes (i.e. has deep learning had an impact, and what would need to be true for it to make an impact).

I'm very supportive of this. It isn't always easy to do, but it would be a valuable contribution beyond cataloging papers.

In some cases, I do think it is useful to briefly note an area where deep learning is starting to be applied, even if there isn't much to say about the overall impact yet.

stephenra commented 6 years ago

@evancofer I would second your thoughts that our time and effort is best spent on the revisions/updates. As @agitter pointed out, it isn't a necessarily straightforward endeavor but one that I do think is worthwhile, even if it is a note about initial applications of a particular method or approach.

rgieseke commented 6 years ago

Interesting discussion! The guidelines for the "Living Data" paper on the Global Carbon Budget might be useful: http://www.globalcarbonproject.org/carbonbudget/governance.htm#gov2.5

The dataset and paper with Carbon Emissions are updated yearly, but the paper (and data) stay partly the same. The practices with regards to authorship might be very different there due to nature of the project and different fields, but the comments on citation recommendation and "self-plagiarism" seem worth considering ...

agitter commented 6 years ago

Interesting parallel @rgieseke

Going forward, we could also bring more organization to the issues. Would adopting help wanted or good first issue labels help would-be contributors find a place to start?

evancofer commented 6 years ago

@agitter that seems like a good idea. I feel like summarizing or discussing any of the articles with an issue might make good first issues.

Also, revisions seem like the central focus of the second release, so creating issues for said revisions (e.g. #847) is probably a good way to elicit meaningful contributions. Perhaps we could do this for each section we felt needed work? Some of the the existing issues (e.g. #598) could also be broken down by subsection into more manageable tasks. I'm not sure about the best way to do this however, and it could just result in too many issues.

stephenra commented 6 years ago

@agitter @evancofer Yes, I think those would be useful labels to have as well. I've gone ahead and created labels for help wanted and good first issue.

I like the idea of creating issues for each section. I think in terms of management, that structure lends itself to being more easily identifiable/accessible for would-be contributors. From a gut feeling, I think breaking down into different subsections might slowly lead to issue creep as you pointed out.

nafizh commented 6 years ago

@evancofer @stephenra If you still need a third maintainer, I would be happy to help. This has been a great work, and I would be happy to contribute to the future versions.

evancofer commented 6 years ago

@nafizh Yes, your help would be greatly appreciated!

nafizh commented 6 years ago

@evancofer Is there an explanation for the labels? I understand most of them are self-explanatory, but I am confused about some of them, for example, paper, treat, study or next.

agitter commented 6 years ago

@nafizh some of the labels come from https://waffle.io/greenelab/deep-review We may want to feature that more prominently in the readme so that it isn't buried in this thread.

manuscript was used for issues about the paper structure, like standardizing whitespace or defining acronyms.

paper and review denote that the issue is for a single manuscript, either a research paper or a review. They haven't been applied consistently.

categorize, discussion, study, and treat pertain to the major sections of our manuscript, and which of them are relevant to the topic or discussion in the issue.

supervised, semi-supervised, and unsupervised weren't used much. They were more relevant for a different vision for organizing the paper that we moved away from. I suggest that we delete these.

The new maintainers should feel welcome to change the label organization.

cgreene commented 6 years ago

I agree with @agitter that a reexamination of the labels would be in order. I'll note only that review used to be used to refer to a review paper, but waffle.io uses it to denote something that is under review. My inclination at this stage would be to delete the paper label and allow waffle to use review as it likes. Then the default issues would generally be papers, and labels could be used more consistently to help new people find issues that are primarily discussions.

evancofer commented 6 years ago

I agree with @agitter that we should delete the supervised, semi-supervised, and unsupervised labels.

I also think we should assume that, unless otherwise marked, an issue corresponds to a paper. Some divisions that come to mind are: community discussion/feedback and project updates, build/orchestration issues, and content revisions?

To revisit our earlier discussion of issue prioritization, I think some good labels might be: help wanted, high priority, low priority, good first issue, and in progress. I also think it could be useful to denote the scale of an issue with something like wide scope or narrow scope, but it may be more productive to make all the issues smaller and more manageable.

stephenra commented 6 years ago

Thanks @evancofer. Agree with @agitter @cgreene as well. I'm OK deleting those labels (supervised, etc.). The help wanted and good first issue I created a few days ago and priority labels are always a good idea. Would an in progress label necessarily overlap with the waffle.io In Progress board?

cgreene commented 6 years ago

I'd recommend using the waffle labels for state where provided - it'll mean that things will look nice on the waffle regardless of how those labels get assigned - so sounds like that one doesn't need to be added

stephenra commented 6 years ago

Thanks @cgreene.

For priority labels, I'd prefer having at least three (e.g. Priority: High, Priority: Medium, Priority: Low). Thoughts @evancofer @nafizh?

evancofer commented 6 years ago

@stephenra Yes, that is a better syntax.