swcarpentry / DEPRECATED-bc

DEPRECATED: This repository is now frozen - please see individual lesson repositories.
Other
299 stars 383 forks source link

What should we teach about writing/publishing papers in a webby world? #199

Closed gvwilson closed 9 years ago

gvwilson commented 10 years ago

What can/should we teach people about writing/publishing/reviewing (i.e., the last lap of every scientific project)? Clearly interacts with reproducible research, open access, etc.; what mechanics/tools should we demonstrate/advocate?

See also #172.

ananelson commented 10 years ago

@asinclair Can you contact me ana@ananelson.com? Would like to hear more about what sort of pieces you want to be able to connect together in an automated way.

IanMulvany commented 10 years ago

What should we teach about publishing on the web? I ended up writing a bit more than I expected, so here are the main peices of advice:

tl;dr:

I would start by advising people to keep in mind the goals of publishing. You want to get your work out into a venue that will be respected by your peers, and noticed by them. In most cases - but not all cases - this will be a journal published by one of the large STM publishers. Elsevier, Springer, Wiley, Taylor & Francis, PLOS and Sage represent a very large part of that market.

You want this process to happen as quickly as possible. Aside from the act of writing, and constructing your story, the act of publishing - getting it onto the web - is pure schlep. Every minute longer that you spend in this process is a minute wasted, as it's not adding value to your research or your ability to put yourself in the position of being able to get the resources you need to do the research you are interested in.

Your first priority is to understand the most appropriate venue and then understand the system that this venue uses to get the work online. Tailor your process to lower the friction between the artefact you create and the process that will be used to get it online.

The great failure of my industry in the face of the web has been to make allow this process to remain orders of magnitude harder than publishing a post on blogger or wordpress.

I'll step through some advice covering these topics now.

The most appropriate venue

Ask your colleagues, confer with your coauthors, it's usually not hard to determine. A tool like the Journal author name estimator has been around for years and it can suggest a journal based on the text of your abstract. In addition the following resources can also help Journal Finder, http://www.edanzediting.com/journal_selector, http://www.journalguide.com/ and http://etest.vbi.vt.edu/etblast3. Most of these are for the life sciecnes.

If your publication is an OA publication the Eigenfactor Journal Rank tool will tell you if you are getting good value for money. This ranks cost of the article processing fee against a rank of the journal determined by their own algorithm.

 Speed of publication

It might be worth checking if there is an alternative venue that might be a lot faster than your first choice.

A common approach is to submit to a high profile journal, and on rejection submit to PLOS one. This is done in order to reduce the thrashing around within the peer review system. Perhaps consider submitting to PLOS one first? You could also look for a journal that is smaller, and might be more responsive. In the life sciences the journal I work for - eLife - is both prestigious and fast.

For the life sciences [Anna Sharman]() has a great resrouce for a selection of journals giving information about decision times, OA charges and journal metrics.

It might be interesting to encourage people attending your courses to contribute to these, or to create similar resources for their own disciplines.

Preprint servers / archives

Your discipline may have a discipline specific archive. Make sure a copy of your work is deposited there. If the full stext is deposited in one of these venues Google Scholar will be able to provide readers with a link to a full text version of your article - even if you have had to publish in a paywalled journal.

Often you can get your work in draft up there before the peer review process is complete (if that's considered Kosher in your field). This can give you priority on an idea, even before the idea has been formally reviewed.

Also, check with your university library and find out what archives they run, deposit there for the same reasons as above.

The OA advantage

Keeping control of your own content is a significant advantage that authors can derive from publishing in an OA journal. I'll touch on that a bit later.

There is another advantage, and that's the advantage of discoverability.

Currently - as of writing this post, the Google main search bot does not index content that is behind an academic paywall for users who do not have access. That means if you publish at an non paywalled venue more people have a chance to find your content.

Now most of your immediate peers will probably be able to access your content by virtue of having it in either the appropriate venue or in an appropriate repository, but it can't hurt to make it even easier to find.

If your coauthors will not agree to publishing in an OA venue, you can always try to modify the copyright transfer agreement that the publishing company will ask you to sign.

You can follow these examples to allow you to retain the right to distribute the paper in any way that you see fit. This is the one piece of advice that I'm giving that might slow down the process of publication, but go on, you know you want to do it, don't you?

What happens to my paper in a big publishing company, and why should I care?

During the reviewing stage a very badly formatted version of your article will be created to be sent to the reviewers of your article. If you have a preprint of your article available, that might even be an easier artefact for the reviewers to use, and it might speed up the review process, though I don't have any evidence to suggest that it will.

If your manuscript is accepted for publication then it will be sent to a large typesetting company, where it will be digitally torn apart and converted to XML. All of the formatting that you do on figures, text and on the reference lists, will be thrown away. I'll just say that again. All of the work and hours you spend carefully formatting your reference lists will be ignored as the content goes through an automated typesetting system. (That's why at eLife we don't have a proscriptive requirement on the format of the references that we get sent, we will take them in any format).

All of your specially chosen fonts, and special text alignment will be mostly ignored.

Depending on the state of the manuscript and the quality of the language in the manuscript it may be checked by a copy editor, either for internal journal style, or for the quality of the language. Much of this work is undertaken by highly educated graduates in developing countries, particularly India, the Philippines and increasingly China - globalisation in action.

Why is this? For the most part the systems that run our global publication infrastructure are old, many of them have code bases that are older than 20 years. Back in the day XML was the only reliable transfer format, and it remains the industry standard today. A slow evolution has been happening with the XML that publishers are using, and under the gentle pressure to deposit into PubMed and PubMedCentral most publishers and typesetters are starting to target one of the many dialects of the NLM DTD. This has become a de-facto standard in the industry, however no writing tools export natively to this format, and the DTD supports, and is designed for, archiving print material. One of the very many consequences of this is that code that is typeset in this DTD is usually typeset as dumb text. On the other hand it does allow a resource like PMC to archive millions of articles, from thousands of publishers, and provide a very fine grained search interface on top of all of this content. I'll mention writing tools a little later.

In order to potentially reduce the time to review your manuscript, and in order to reduce your the time your manuscript takes in the copy editing / typesetting process the following things could help:

Remember, this is probably a lifestyle choice, my main advice is pick a tool that does not have too much lock in. I used to work at Mendeley and believe it to be as good as any tool out there.

 But wait! I want to do iPython, interactive, open data, virtual machines, 3D printed DNA dinosaur replication and what you have just told me sound like like I can't do that - that sucks :(

Yes, yes, it does suck, and I hear what you are saying, but remember, at the moment of publishing, your priority is to get the damn work published, and unfortunately that still means interacting with a system that has changed little since the late 17th century. There are moves in the right direction, oaises of sanity, but there is a long long way to go.

If you feel really passionate about this then the best thing you can do is to keep the rights to your own work, get the paper out as a CC-BY paper in a boring old venue, and then do the kind of publication that you really want to on your own academic home page, and build your own audience around your work that way. In that case you want the boring route to take up as little time as possible.

You should also deposit artefacts of your paper in the best possible place for them. Code to a location like github. Videos to youtube or Vimeo. Images to flickr. Data to Figshare, DataDryad, Zenodo, or one of the very many other subject specific data repositories that may be appropriate for your field.

Try and keep your artefacts well organised, and backed up off of your machine. You can back a lot up to github as part of a git repo, but that's not it's main use case. You can use a service like EverNote, or get a licence for a research specific asset management tool like Projects or LabArchives.

The aim here is to reduce the friction in getting instances of these resources into the hands of others - if you believe that this is a critical part of doing research.

It can also to make it possible to recover this informaiton in the instance of losing your main machine. (I decomissioned my main machine last summer via cup of coffee).

For the purposes of archiving your work you should also check with your institution and library to see if they can provide support or systems. Librarians in many institutions are mustard keen to help, as it provides a way for them to prove value to the academy in a world in which library subscriptions are under extreme pressure. You may find yourself with the problem of having too many options - which is not a bad problem at all.

Authoring tools, and why does this all suck so much?

I noticed that there was some discussion in the thread about collaborative tools for authoring. Again, I'll just stress, get the work published as soon as possible. This might mean sending a PDF of the article to a publishing house, or having to just send in a Word file.

On the other hand, there are a new generation of online tools emerging for writing, and also tools emerging for writing on the iPhone and iPad. I think we have more viable options now at our fingertips than at any time in the past. I don't believe that there are any serious contenders yet ready to oust the Word/LaTeX duopoly, but it would not hurt to take some of the following for a test drive to help with the authoring experience. It's too broad a topic to go into a detailed review of each one, I'll leave an investigation of these tools as an exercise for the interested reader. The list below is just a smaple, there are a bunch of others out there.

The tool that I see emerging at some time on the horizon, and that I have a lot of excitement for, is the work on the substance reader and composer and eLife lens. What's really nice about this is that to get started you can import NLM XML directly, or markdown via panodoc. It does a great job of separating the view, logic and control of the writing experience, and so it should also be possible to write directly in browser, and export to a publication ready format directly - but some work remains.

In my own ideal world you can submit an idea to a journal as part of a pull request to the publication, peer review takes place in some system similar to how we do code review today. On acceptance the full digital artefact is published instantly. The writing and collaboration happens in almost any tool that the user likes, modifications are synced via something like dropbox. In this world writing tools support offline, as well as online modes, and content logic and views can be assembled independantly. In my ideal world the source is open. We are a little bit away from that at the moment, but there is no doubt in my mind that we are moving in that direction. [this great post by plos] has some great insights discussing what the native format for publihsing on the web should be.

About this post.

As we are discussing publishing on the web, I thought it might be useful to describe the tools I used to write this post. The body of the text is stored on my machine as a plain text file, and I store all of these in one directory using nvALT to manage them. This directory is also held under a Dropbox account, and I can access the content from my iPhone through a variety of editors, but in this case I didn't use any of these.

For writing this post I used WriteRoom for mac in distraction free mode. I often use SublimeText in distraction free mode too. For some shortcuts in formatting I used TextExpander. To format the links I write the post in markdown, and did the formatting in SublimeText. I previewed the post using Marked. I also used Marked to verify that all of the links were working, at the time of writing. I used the GrabLinks bookmarklet to gather all of the links from this post to add in as a resources list at the end of this post. In order to publish the post on my blog I posted it directly into a github repo using github pages to render the content. You can see the result at my blog where I have cross posted this comment.

Final thoughts

I realise that I have mostly been answering the question about what shlould people know about the world as it is now, and not so much about what tools or approahces we should advocate to make the world a better place, but I hope that we can have a clear view on what is bad, so that this can help people make pragmatic decisions about how to change things for the better.

 resources

khinsen commented 10 years ago

If anyone's interested, here's the repository for my upcoming course: http://github.com/khinsen/FdV-ScientificComputing-2014 For now there is only the material for the first session, which contains an introduction to the subject and a practical session on Git.

The Git course is the "novice" course from Software Carpentry, reformatted as slides based on remark.js. The reason for this is that I want to use only those techniques for preparing my course that I also explain to my students. I have no intention of teaching Jekyll or any Markdown-to-HTML converter, so remark.js turns out to be a good approach. The slides are not quite as nice as with some other HTML-based frameworks, but simplicity is more important for me at this stage.

In the following sessions, I plan to cover automation and data management, including data publishing (figshare, Zenodo, etc.).

gvwilson commented 10 years ago

How did you do the reformatting to remark.js? Manually, scripted, pandoc, other?

khinsen commented 10 years ago

First, I reformatted the links to the images using an Emacs macro (a few minutes' work). The other links (glossary, image directory) just took a global search/replace.

After that, I had to do some pagination to cut up the text into slides. I did that by hand, while reading through the whole text (which is a good idea anyway before teaching something ;-). I don't imagine doing this automatically because I did try to keep related material on the same screen as much as possible.

tpoisot commented 10 years ago

There is also something to be said about using GitHub Issues mechanism to hande replies to referees. That's something I like more and more.

ahmadia commented 10 years ago

@IanMulvany - Thanks for the detailed comments, I really appreciate your insight into some of the problems and the collection of tools you linked to. I wanted to quickly point out that many of your comments on publishers seem to be specific to your discipline. There are many publishers within mathematics and the computational sciences that are LaTeX-driven, as opposed to XML-driven, so the formatting and submission rules do not necessarily apply.

ahmadia commented 10 years ago

Also, I think there's enough in this GitHub issue to put together a solid article on options and recommendations for publishing as a scientist. @gvwilson and @kaythaney, what do you think?

tpoisot commented 10 years ago

@ahmadia I think such a paper would be really great. It could also be an opportunity to talk more about "scholarly markdown", which seems to be gaining tractions. I'm sucre @karthik has tons of good refs on that. Basically, a table comparing the pros/cons of using Word, LaTeX, md, Rmd, ..., to write a scientif paper would be a really great ressource.

gvwilson commented 10 years ago

Would someone like to volunteer to be lead author? Seems like an obvious match to PLOS Comp Bio's "Ten Simple Things" collection: http://www.ploscollections.org/article/browse/issue/info%3Adoi%2F10.1371%2Fissue.pcol.v03.i01

asinclair commented 10 years ago

Two further thoughts:

  1. It's essential that tools support a field's reporting standards, which in clinical research represent a years-long push to improve the standards, transparency and usability of published reports - See: the Equator Network (http://www.equator-network.org/). I get the impression that's spilling over into other fields.
  2. To quote Paul Murrell in his talk at the Joint Statistical Meetings last August: "Don't be a dead end." He was talking about interacting and manipulating figures produced by grid graphics in R, but the principle applies to scientific reporting as well. I spend days manually extracting data from PDFs, because even in its digital form, the traditional scientific article is a dead end. Reporting needs to consider the downstream.
gvwilson commented 10 years ago

Please see (and comment on) #303 (a blog post summarizing discussion to date).

IanMulvany commented 10 years ago

nice post, no additional comments.

iglpdc commented 10 years ago

I think that is not fair focus the issue in writing a paper for publication in a journal. Most of the scientific writing is taking notes on talks, classes and other people's papers; keeping lab journals; and writing documents like a thesis, etc… For people outside the more geeky communities, like physics or math, all this happens in Word and Excel. They also do little coding, so their motivation to use version control sinks when they learn that it's not going to fit well in their workflow.

For most people the first thing that has to go under version control is their lab journal. I think Markdown can help a lot here, and that we should teach it as the tool to do most of the writing. If at the end of a 6-month research project they have to write the final paper in Word, but have all the science supporting it in a repo, I think they would have improved the quality of their work a lot.

khinsen commented 10 years ago

Good summary of the discussion. If that's the point of the blog post, great.

If the point is to pass on a specific message to the reader (who may not care so much about the discussion), I don't see any clear message right now. The post basically says "We have some good ideas of how we would like to write papers, but right now they are not practical." Which, again, I consider a good summary of the discussion.

It would be nice to conclude with something pragmatic: 1) What can we (SWC) teach right now to the students? 2) What could/should/happen elsewhere (eg. at the publishers' side) to make the world a better place in the not-so-far future?

As for 1), I agree with others who said that we should focus on the "lab notebook" aspect and say honestly that there is no good solution right now to interface with the current publishing universe. The important point is that the students would gain by adopting "our" approach for the lab notebook rather than using the publishing tools early on in the research.

As for 2), the question makes sense only if people in or near the the publishing business read the post. I am optimistic that the new innovative Web-based journals (PeerJ, F1000, ...) would be listening, given that they are trying to create their market niche right now.

tpoisot commented 10 years ago

The point 2 of @khinsen is a good reason to go for a paper -- the current situation won't change until people in the publishing business decide to make it change, and it's more liklely they will pay attention to papers rather than blog posts.

onnodb commented 10 years ago

This is an interesting discussion!

Although the Bootcamp curriculum already seems overfull to me, adding a small module on publishing seems like it’s actually a very good idea. It may make participants feel that the bootcamp is more relevant to them, and a bit closer to home.

Nevertheless, if I’d have to answer the question “what to teach about writing papers?” , I’d be so bold as to answer nothing. Paper writing is a messy business, with no one-size-fits-all solution, and a strong factor of personal preference and local tradition. Your supervisor loves Word 97 with EndNote, so you use Word 97 with EndNote, period. If we can help people by pointing out the potential use of Version Control in managing the whole thing, then great (as suggested by @pipitone).

What is important, is to drive home the message that your underlying research should be reproducible. Show people how you could publish a set of scripts on GitHub that reproduce the paper’s figures (like the venerable WaveLab, or some of the papers linked to in this thread). Show them how you could upload your raw data (including metadata!) to figshare. Give them concrete, field-specific tips on how to publish green or gold OA (see @IanMulvany). Show them how to link to these resources in the paper itself (SI). Make it very concrete: actually download a paper’s scripts and run them.

And maybe, finally, spend a few minutes sharing your vision of Future Publishing, in which manuscript submission systems have caught up with the rest of the Web, citations are semantic, and everything is OA. After all, you’re talking to the future generation of scientists.

But how to write the paper itself? For all I care, you write it in MS Paint.

(ahem, time to step off that soapbox :-) )

gvwilson commented 10 years ago

Add link to http://yihui.name/en/2013/10/markdown-or-latex/