swcarpentry / DEPRECATED-bc

DEPRECATED: This repository is now frozen - please see individual lesson repositories.
Other
299 stars 383 forks source link

What should we teach about writing/publishing papers in a webby world? #199

Closed gvwilson closed 9 years ago

gvwilson commented 10 years ago

What can/should we teach people about writing/publishing/reviewing (i.e., the last lap of every scientific project)? Clearly interacts with reproducible research, open access, etc.; what mechanics/tools should we demonstrate/advocate?

See also #172.

uiuc-cse commented 10 years ago

A demonstration of how arXiv works—or any other preprint server—would be invaluable, as I am certain that many people aren't following these things yet.

SWC already has some discussion of licensing and open access which can be readily extended to cover papers. We could find a good list of open-access journals (COS or EFF probably has one).

Reviewing may be hard to cover effectively since it works a little differently by field. Plus that really is something that you should be dealing with with your advisor (if you are in academia).

Neal Davis

SonOfLilit commented 10 years ago

Give an excercise where they read one paper that is reproducible research and one that isn't and need to interact with them in a similar manner (Answer a deep question? Extract some data for a meta-analysis?).

Let them feel how much more usable reproducible research is.

stefanv commented 10 years ago

To make papers more suitable for code-review on GitHub, we use ReStructuredText to write the SciPy conference papers. The conference tools are aimed at a whole proceedings, but I just reworked these tools for a single paper on scikit-image we're writing:

https://github.com/scikit-image/scikit-image-paper

I also have some tools for formatting papers for uploading to Arxiv, which I think is a particularly handy thing to be able to do:

https://github.com/stefanv/arxiv_tools

(In this case, my paper was in LyX format, but it is trivial to modify for pure LaTeX or other formats).

ahmadia commented 10 years ago

@stefanv - Thanks for bringing up the way SciPy papers are done, very forward-looking!

jkitzes commented 10 years ago

On the literal act of writing, I think the biggest hurdle, and most important teaching point, is simply the idea of writing papers in plain text (regardless of the markup language). This is probably a big enough jump as it is for most introductory bootcamps. There are lots of advantages to plain text, as we know, but in the context of the bootcamp, one of the biggest is that it provides a really good use case for version control, and so it ties in nicely to the other bootcamp materials.

The elephant in the room for students, of course, is (a) why they should change to a practice (leaving Word) that will be viewed as strange and potentially difficult by other collaborators, and (b) more specifically, how they will interact with collaborators who only use Word for track changes and commenting. I don't know that I have good answers to either of these questions.

On the practical act of publishing, I'd rather take the time to explain self archiving (i.e., author accepted manuscripts) rather than pre-print servers, as I think the former is more supportive of open science, especially in fields where pre-prints are not widely expected nor read.

mkcor commented 10 years ago

what mechanics/tools should we demonstrate/advocate?

As usual, I would reply: The IPython Notebook!

I didn't realize you could format text so nicely around code, until I read some of Jake Vanderplas's blog posts which were "written entirely in the IPython Notebook" (@jakevdp).

dpshelio commented 10 years ago

From my point of view, an important part on writing papers is the way we handle references. I'm always surprised the different ways the people do to handle the references they used (mainly based on their memory - the one in the brain, not the computer one). Almost no-one in my environment uses the advantages of the web or new technologies in their favour to find the reference they want, or the thing they read... (I mean pdf search, metadata classification of papers, etc). Tools like zotero, Mendeley and many others that also simplified collaborations, or even simple ones without the social stuff like jabref together with bibtex for LaTeX makes writing papers a lot easier.

I wonder if there's an easy way to integrated bibtex directly with rst or markdown (or even ipython notebooks).

rgaiacs commented 10 years ago

@dpshelio About "I wonder if there's an easy way to integrated bibtex directly with rst or markdown (or even ipython notebooks)." I don't think so.

ethanwhite commented 10 years ago

I wonder if there's an easy way to integrated bibtex directly with rst or markdown (or even ipython notebooks).

Pandoc can handle bibtex citations [1]. See the makefile in [2] for an example of how we use this for writing papers.

[1] http://johnmacfarlane.net/pandoc/README.html#citation-rendering [2] https://github.com/weecology/data-sharing-paper

twitwi commented 10 years ago

There are a few things we could mention to boost people efficiency in writing (even if these are not necessarily linked to the "webby world"):

dpshelio commented 10 years ago

@ethanwhite nice one!! thanks!!

dfalster commented 10 years ago

Instead of jumping straight to the final paper, it might be better to get people thinking first about writing reproducible reports, e.g. using knitr (in R) or ipython notebook. Such reports are useful for gathering together key ideas and disseminating these to coauthors for discussion, before producing a full blown paper.

jdblischak commented 10 years ago

I like this idea and also think there would be interest among bootcamp attendees. And even if it can't be covered in a standard 2-day bootcamp, I think it would be great to point them to a resource that they can use months after the bootcamp, i.e. once they are comfortable with git.

Actually, I am the perfect target for this lesson. I have adopted SWC principles and am striving to work in the "open." The biggest impediment I currently have is the licensing. For example, I'd ideally like to have my code, some summary data files, and the paper all in the same repository. While I know that I am fine with putting the code under the MIT license, I am pretty confused how to license the data and paper. Is the Creative Commons Attribution License sufficient to prevent others from publishing a paper that uses my data*?

*Of course I will make it free to use upon publication.

wking commented 10 years ago

On Thu, Dec 19, 2013 at 09:10:34AM -0800, John Blischak wrote:

While I know that I am fine with putting the code under the MIT license, I am pretty confused how to license the data and paper. Is the Creative Commons Attribution License sufficient to prevent others from publishing a paper that uses my data*?

*Of course I will make it free to use upon publication.

I think "Copyright $x, all rights reserved." is the safe bet for stuff you don't want others reusing. You can always re-license once you get the paper published.

ahmadia commented 10 years ago

Is the Creative Commons Attribution License sufficient to prevent others from publishing a paper that uses my data*?

In most situations, yes, it is sufficient to keep somebody else from publishing your work, since any reputable journal will refuse to publish work that has already been published or written by somebody else.

The problem with this approach is that many of today's journals will refuse to publish your work if you've already released it somewhere else, especially under a license granting reuse. As @wking comments, reserving copyright is the simplest approach here.

gvwilson commented 10 years ago

http://yihui.name/en/2013/10/markdown-or-latex/ may be relevant...

kaythaney commented 10 years ago

Jumping in here ...

There seem to be a few main points emerging here ... questions about whether we're trying to teach more about the end result, or change practice leading up to that final write up. There's been a lot of work in the open science / scholarly communication circles around various aspects touched on here - tools, workflow hacks, discussion of new forms of publishing more reproducible research. I've written up a blog post to see if we can involve some of those experts in this discussion ...

http://mozillascience.org/what-should-we-teach-about-publishing-on-the-web/

znmeb commented 10 years ago

Well, for openers, the equivalent to Christopher Gandrud's book https://github.com/christophergandrud/Rep-Res-Book in Python, perhaps facilitated by Dexy.it https://github.com/dexy/dexy and Pandoc.

cameronneylon commented 10 years ago

This is really interesting. I think it could be worth taking a step back and re-phrasing the question a little. Is the object to teach those building tools about publishing in general (ie what tools and hacks might be useful to create) or is the focus here specifically on how to get better incorporation of code into published work? I think the latter is the focus but it might be good to be explicit.

On that basis it would be good in my view to touch on some background in literate programming to give people a bit of context and then look at various authoring tools (KnitR, IPyNB, Sweave, Dexy...others presumably) alongside various code repositories and data repositories in that light. This would then provide a way of thinking about the available tools as a way of telling the story, which is different to how they are generally used in practice to manage code and records and actually do the work.

It's a personal bias but I'd also be inclined to spend some time on the sausage making of the publishing process and why it doesn't fit with what the tools above. What gaps are there? How could they be filled? What would the optimal system look like? What formats would be used?

That's a bit chunky but its the way I'd approach it.

Daniel-Mietchen commented 10 years ago

I welcome ideas to make the writing of "papers" easier and to facilitate joining or reproducing the writing process, but in a webby world that is aware of version control, why not take advantage of that for updating scholarly knowledge directly, in one place, forking only when really necessary?

At present, whenever you come across an interesting article, there is basically no way to predict where the next article on the subject is going to be published, but if there were already a reasonably good article on that topic and it were publicly available, openly licensed and version controlled, there is no reason why new materials relevant to it should not be added as they become available. We could "watch" it the way we watch GitHub repos or Wikipedia articles, and we could engage with updates much more directly than via static stand-alone documents.

In order to "publish" our research, do we really need to write (and review) a ten-page narrative summary thereof if it is available (assuming reasonable long-term preservation) from open notebooks, data and code repositories in maximal detail and could be contextualized and made more widely known by simply inserting a few words, paragraphs, illustrations, equations or lines of code into an existing article with a slightly broader focus?

samuelmoore commented 10 years ago

Hi all. Specifically regarding the publication of software, the Journal of Open Research Software (of which I am managing editor) has devised a checklist as part of our peer-review process that might be useful: http://openresearchsoftware.metajnl.com/about/editorialPolicies#peerReviewProcess

pgroth commented 10 years ago

I would recommend looking at literate programming as Cameron suggests. (Ipython notebooks is great but it would be good to say that there are others out there.) Finally, using a workflow tool (taverna, knime, wings, galaxy) to chunk codes together in to understandable pipelines is useful when sharing reproducible research [1]

Another thing would be to suggest good practice around attribution in both code and documentation.

It's good to discuss the ability to use plain text, either through markdown or latex.

https://www.authorea.com is a nice tool (although commercial) for demonstrating this.

[1] http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0080278

pgroth commented 10 years ago

Oh one more thing, you might like to talk about the importance of stable URLs and having urls that don't change or won't magically disappear when you move departments or whatever. See data services or owning your own domain name.

cboettig commented 10 years ago

Lots of interesting directions here. A few thoughts:

Start with data publishing, code publishing

I think the first thing to teach would be best-practice platforms for publishing code and data independent of the rest of the publication process (dynamic documents etc can wait). Most journals being what they are, changes to workflow there are much harder and I think ultimately much less valuable than teaching people how to publish data in a permanent archive with complete and accurate metadata, and how to publish code with version control and metadata.

Plain text for publications

If the goal really is to address preparing publications for journals, I would focus on plain-text publication tools (markdown/pandoc, possibly LaTeX) along with version management/collaboration tools (Git/Github). Getting more folks off of Word and comfortable with any alternative would be the single biggest value, spur more innovation, ease collaboration for the rest of us ;-) The remaining effort effort should be spent addressing pain points in the process.

Pain points:

In my experience and reflected in the comments above, these are some of the major pain points in text-based publishing. Teaching anything that addresses these challenges would be really helpful.

Journal submission software

The paleolithic nature of things like Manuscript Central today is a real barrier to a more modern and web-native workflow, making almost all advancement in this area more of a hack then a solution. That said, as far as I've seen most will take pdfs at least for the initial review, or do a reasonable job with tex files (when restricted to 1990s era tex).

Probably worth mentioning journals/prepublication platforms that don't suffer these limitations (e.g. figshare, arXiv -- who knew you could zip the code and data with your arxiv paper?)

Collaboration

Collaborating with others using Word or some other platform is as annoying as it is inevitable. The problem isn't limited to Word -- in two of my own manuscripts I'm writing in R-markdown some collaborators edit the tex file. Others would rather just write notes with pen on a print-out anyway (or with a stylus on the pdf), making the software choice rather irrelevant anyhow. Beyond that I don't have any great suggestions here but am happy to learn!

Citations

Even with the host of reference managers, citations are unnecessarily annoying and collaboration can be difficult when everyone has a different preferred reference manager. What bothers me most though is it just feels archaic.

In a web-native world, citations are links (Fenner 2010). (Preferably using permanent identifiers, and ideally with semantics). The remaining bibliographic information can be automatically generated from the link by any reasonable tool (e.g. using Crossref APIs for DOIs), it shouldn't be the author's concern (who should be free to worry about the semantic reason for the citation, (e.g. cito:critiques) rather than the article page numbers). Unfortunately the platforms generating citations from links aren't as developed as they might be.

Dynamic Documents

knitr, Sweave, ipython, Dexy, even Make etc are wonderful tools that are worth exposing researchers to, but perhaps it is more important to teach the concept than the particular implementation in this case. Dynamic documents can introduce additional challenges to collaboration and additional gotchas (via caching, etc)

pgroth commented 10 years ago

+1 to Carl's analysis

On Mon, Jan 6, 2014 at 7:19 PM, Carl Boettiger notifications@github.comwrote:

Lots of interesting directions here. A few thoughts: Start with data publishing, code publishing

I think the first thing to teach would be best-practice platforms for publishing code and data independent of the rest of the publication process (dynamic documents etc can wait). Most journals being what they are, changes to workflow there are much harder and I think ultimately much less valuable than teaching people how to publish data in a permanent archive with complete and accurate metadata, and how to publish code with version control and metadata. Plain text for publications

If the goal really is to address preparing publications for journals, I would focus on plain-text publication tools (markdown/pandoc, possibly LaTeX) along with version management/collaboration tools (Git/Github). Getting more folks off of Word and comfortable with any alternative would be the single biggest value, spur more innovation, ease collaboration for the rest of us ;-) The remaining effort effort should be spent addressing pain points in the process. Pain points:

In my experience and reflected in the comments above, these are some of the major pain points in text-based publishing. Teaching anything that addresses these challenges would be really helpful. Journal submission software

The paleolithic nature of things like Manuscript Central today is a real barrier to a more modern and web-native workflow, making almost all advancement in this area more of a hack then a solution. That said, as far as I've seen most will take pdfs at least for the initial review, or do a reasonable job with tex files (when restricted to 1990s era tex).

Probably worth mentioning journals/prepublication platforms that don't suffer these limitations (e.g. figshare, arXiv -- who knew you could zip the code and data with your arxiv paper?) Collaboration

Collaborating with others using Word or some other platform is as annoying as it is inevitable. The problem isn't limited to Word -- in two of my own manuscripts I'm writing in R-markdown some collaborators edit the tex file. Others would rather just write notes with pen on a print-out anyway (or with a stylus on the pdf), making the software choice rather irrelevant anyhow. Beyond that I don't have any great suggestions here but am happy to learn! Citations

Even with the host of reference managers, citations are unnecessarily annoying and collaboration can be difficult when everyone has a different preferred reference manager. What bothers me most though is it just feels archaic.

In a web-native world, citations are links (Fenner 2010http://blogs.plos.org/mfenner/2010/12/11/citations-are-links-so-where-is-the-problem/). (Preferably using permanent identifiers, and ideally with semantics). The remaining bibliographic information can be automatically generated from the link by any reasonable tool (e.g. using Crossref APIs for DOIs), it shouldn't be the author's concern (who should be free to worry about the semantic reason for the citation, (e.g. cito:critiques) rather than the article page numbers). Unfortunately the platforms generating citations from links aren't as developed as they might be. Dynamic Documents

knitr, Sweave, ipython, Dexy, even Make etc are wonderful tools that are worth exposing researchers to, but perhaps it is more important to teach the concept than the particular implementation in this case. Dynamic documents can introduce additional challenges to collaboration and additional gotchas (via caching, etc)

— Reply to this email directly or view it on GitHubhttps://github.com/swcarpentry/bc/issues/199#issuecomment-31671978 .

rgaiacs commented 10 years ago

@cboettig Nice analysis.

Yet we all want move to a webby world there are some journals/conferences, most of then from fields outside math/computer/engineer where complex equations didn't exist, that will only accept Words files (you have been blessed if it accept ODT). For this cases we should start saying that some tools (e.g. pandoc) can convert from Markdown to Word.

I had this type of problem last year.

khinsen commented 10 years ago

An interesting discussion in particular as I am right now preparing a course about modern technologies for research aimed at PhD students.

TL;DR: I mostly agree with @cboettig.

I decided to focus on immediately useful stuff, and end with an outlook on upcoming promising technologies which today's young scientists will have to know about if they plan to do a career in science. Publishing SciPy-style is definitely in the second category because a PhD student will have to work with mainstream journals in the relevant time frame (three to four years). I will, however, present and recommend a plain-text-with-version-control approach for doing research, including keeping notes. It's only for the formal writeup that I think we have to stick to traditonal techniques for a while - which unfortunately means Word in the disciplines where it dominates.

If anyone has a good idea for collaborating with Word users while sticking to decent tools, I'd be eager to learn about it. It's the most frustrating aspect in my collaborations with biologists.

khinsen commented 10 years ago

One more comment about publishers and Word: in my experience, they are happy if you send them a Word file, whatever its contents. Just paste your Markdown text into Word and submit it. It's only when you need formulas (maths or chemistry) that this approach breaks down. The technical editing staff at major publishers actually does a very good job and can deal with anything that's reasonably clear.

khinsen commented 10 years ago

And one more question. For writing papers, the publishing system imposes lots of constraints, but there is complete freedom for producing slides for presentations. Is anyone aware of a useable plaintext-based system for generating slides, other than the various LaTeX packages? The condition that excludes most of the simple tools is the possiblity to integrate images, plus ideally mathematical equations.

rgaiacs commented 10 years ago

@khinsen AFAIK you can use pandoc to convert from Word to Markdown to. If you can test and give us a feedback of how Markdown -> Word -> Edit -> Word -> Markdown works will be great.

About slides for presentations you can try some Javascript/CSS library and write HTML (yes, I know that HTML is not the best plaintext format). For mathematical equations you can use MathML or LaTeX with the help of MathJax.

ahmadia commented 10 years ago

+1 to the various Markdown+MathJax slideshows out there. I've had good luck writing in Markdown+MathJax, and using pandoc to convert to one of the slideshow formats. As @r-gaia-cs mentions, pandoc can try to convert a number of different formats to Word, but it has very limited abilities to handle complex formulae. I think you already know this, but pandoc is also the underlying engine beneath IPython's nbconvert tool.

khinsen commented 10 years ago

@r-gaia-cs My biologist collaborators use the revision tracking system in Word. From what I could find about pandoc conversion, this information doesn't survive, so I don't think pandoc is the solution for me. However, I could at least use it to write my own contributions which I could then convert and paste into the master file - I will try this next time.

@ahmadia Do you have an example of slides in Markdown+MathJax?

ahmadia commented 10 years ago

@khinsen: this is a very limited demo, but: http://aron.ahmadia.net/pyhpc/petsc4py-tutorial-slides.html

Here's the corresponding source: https://github.com/pyHPC/pyhpc-tutorial/blob/master/markdown/scale/petsc4py-tutorial.md

gvwilson commented 10 years ago

@jkitzes wrote:

The elephant in the room for students, of course, is (a) why they should change to a practice (leaving Word) that will be viewed as strange and potentially difficult by other collaborators, and (b) more specifically, how they will interact with collaborators who only use Word for track changes and commenting. I don't know that I have good answers to either of these questions. And aye, there's the rub. Word is easier to use for normal tasks (like writing a paper with bullet points and italics) than Markdown, much less LaTeX --- it's only Stockholm Syndrome that makes us believe otherwise :-). And as long as both senior faculty and journals require people to submit Word (or PDFs derived from specific Word templates), it's hard for us to say, "No, really, version control is better in the long run," because the long run ends in you wrestling with Pandoc to try to get it to format things the way some particular conference requires.

(True story: I submitted the outline for our upcoming SIGCSE workshop to the ACM using their LaTeX template. During the holiday break, I got mail telling me I had to re-do it using their MICROSOFT WORD template (their capitalization), which of course LibreOffice couldn't load properly.)

So: given that the end product must be acceptable to senior profs and journals, and that markup-based tools impose more cognitive load on newcomers than WYSIWYG alternatives (i.e., the payoff for switching is tomorrow, the pain is today), what's our path forward? What can we teach in an hour that the average biologist will find compelling?

gvwilson commented 10 years ago

HTML slideshow packages are a great example of the disconnect Philip Guo talked about in his Two Cultures essay:

Yes, programmers can use that format to put a callout beside a table with an arrow pointing to a circled cell and a picture of a kitten beside it, but it's a lot of work compared to just WYSIWYG'ing it in PowerPoint, Keynote, or what have you. As with markup-vs-WYSIWYG for preparing papers, I think the distinction is between people who look at text littered with strange symbols and "see" the final (compiled) product, and people who want to directly manipulate that final product without having to mentally compile it (or reverse-compile it).

Now that the element is widespread, there's no reason why we couldn't create an authoring tool that would let people generate HTML5 slideshows without mental compilation and typing lots of strange symbols. My suspicion, though, is that those slideshows wouldn't be any more diff'able or merge'able than IPython Notebooks, i.e., they'd be almost as hard in practice for version control to work with as what we have today. They would therefore fail to satisfy end users ("Why should I switch? It only does half of what Keynote does!") and programmers ("Why should I switch? I still can't merge, and your composition tool doesn't have Vim bindings!").

ahmadia commented 10 years ago

@gvwilson - The file extension tells me a lot about what somebody wants to do with their work:

I think this is less about the authoring process, and more about the sharing and collaborating process. I have yet to encounter a scientist who defended Word for working with lots of collaborators and versions. Their track changes features simply don't scale.

PDF goes everywhere, but is not easy to edit/version.

HTML goes everywhere, is easy to version, and is slightly painful to write. Markdown is a compromise, but it's a good one, and we'll see better WYSIWYG editors for slideshow presentations in the future.

gvwilson commented 10 years ago

@ahmadia wrote:

I have yet to encounter a scientist who defended Word for working with lots of collaborators and versions. But that's a non-issue. We have to convince people to switch when working in the small, because that's the normal case for most scientists. At least, I think it is: does anyone have a histogram of how many papers are written by how many authors?

HTML goes everywhere, is easy to version, and is slightly painful to write. Markdown is a compromise, but it's a good one, and we'll see better WYSIWYG editors for slideshow presentations in the future.

It's easy to sell futures on the stock exchange; it's much harder to sell them in a classroom... :-(

pipitone commented 10 years ago

Even if folks are writing their papers in Word, I still think version control is a useful tool when paper writing, because there is so much more to writing a paper than just the final document, e.g. results files, figures, images, correspondence, submission documents, as well as any scripts you use to do analysis and generate other assets. You may not be able to use 'git diff' on a word doc but you can use it on many of these other things . And even then, under VC you can still checkout an older copy of your paper, and use Word's compare feature to do the diff. Plus you get the benefits of having a log of your changes, easy backups (e.g. git push) and rollbacks, etc.

The point I'm making is I think the benefits of version control when paper writing are worthwhile despite the fact that word files don't diff easily.

I'd also like to suggest that teaching folks to use knitr or ipython notebooks or even just to create scripts to generate figures[1]) can be a really useful thing. I've been showing people how to use rstudio to create a draft of their paper in markdown to leverage the power of knitr. Even those that don't draft their paper in markdown but just use it like an ipython notebook get value out of being able to build up a document of figures and tables which they can paste into their word documents[2].

It's not perfect, but I'd argue it is better.

Jon.

[1] I work in a research hospital where many people use R but rarely write scripts... The good students keep a word document with code that they cut and paste into the R REPL. ugh.

On 01/07, Greg Wilson wrote:

@jkitzes wrote:

The elephant in the room for students, of course, is (a) why they should change to a practice (leaving Word) that will be viewed as strange and potentially difficult by other collaborators, and (b) more specifically, how they will interact with collaborators who only use Word for track changes and commenting. I don't know that I have good answers to either of these questions. And aye, there's the rub. Word is easier to use for normal tasks (like writing a paper with bullet points and italics) than Markdown, much less LaTeX --- it's only Stockholm Syndrome that makes us believe otherwise :-). And as long as both senior faculty and journals require people to submit Word (or PDFs derived from specific Word templates), it's hard for us to say, "No, really, version control is better in the long run," because the long run ends in you wrestling with Pandoc to try to get it to format things the way some particular conference requires.

(True story: I submitted the outline for our upcoming SIGCSE workshop to the ACM using their LaTeX template. During the holiday break, I got mail telling me I had to re-do it using their MICROSOFT WORD template (their capitalization), which of course LibreOffice couldn't load properly.)

So: given that the end product must be acceptable to senior profs and journals, and that markup-based tools impose more cognitive load on newcomers than WYSIWYG alternatives (i.e., the payoff for switching is tomorrow, the pain is today), what's our path forward? What can we teach in an hour that the average biologist will find compelling?


Reply to this email directly or view it on GitHub: https://github.com/swcarpentry/bc/issues/199#issuecomment-31742257

stefanv commented 10 years ago

@khinsen Since you asked, we've had some success using Remark for doing HTML slides in Markdown. E.g.:

http://cournape.github.io/davidc-scipy-2013

You can use MathJax with it, as well as print to PDF.

TheoBloom commented 10 years ago

From the less technical and more editorial perspective, I'd say the key issue is that authoring needs to be done with reproducibility and re-use in mind. So, even if you are working in Word, the starting point needs to be one of preparing information for the person who wants to re-use your 'research objects', not just read a narrative about them. And if you're talking about educating people who are already of a technical mindset, this should be a relatively easy point to make.

cboettig commented 10 years ago

@gvwilson @jkitzes Great points that cut to the heart of the matter; hence my initial arguments that SWC should first focus on publication of code and data with appropriate metadata, which is a natural context to introduce plain-text-based scientific writing (and probably the experience from which many of us first realized it might make sense to do the same for manuscripts).

The reason to adopt a plain-text (version-controlled, online collaborative) workflow is the same reason software carpentry teaches everywhere else: it will save you time. Yes, it makes collaborating with Word users potentially more time consuming, while making your own writing and other collaborations less time consuming. If the arithmetic comes out in your favor and you save time, great. If not, stick to Word. (Or develop in markdown and then paste/pandoc into Word for editing and revisions). This is what I and no doubt many on the list do -- use markdown or latex for the time-saving, headache-reducing benefits it provides, and switch that to Word (or tex or Google Doc or whatever) if or when the transaction costs of collaborating become too high.

Otherwise we risk painting the false dichotomy and echoing every flame war between choices of software or programming language. I believe SWC students should simply be given the basic skills to author scientific documents on the web in plain text, and they can then choose the appropriate medium based on context.

cameronneylon commented 10 years ago

Lots of really great points here but I'd like to go back to my original point as well:

We need a framework to discuss this in that steps a little away from the framework of the rest of SWC.

The reason I say this is actually well demonstrated by the subtle ways in which all the suggestions are butting up against each other in not so comfortable ways. All of us have an implicit framework into which our thinking about authoring and sharing papers fits. Many of us also have a similar, but perhaps not identical, framework we use to think about code (and data, and...and...)

The students don't.

They're just at the point of trying to wrap their head around version control and the shell. That means to my mind (and I defer to the real education specialists as I am definitely not) that a combination of the practical and the abstract will help them understand both the software side better as well as allow them to come to their own conclusions about how that framework does or does not apply to authoring papers.

Or to put it more simply. You've just learned about some ways of thinking about writing and using code and data that might help you work better. Authoring a paper is a different process. Here's why. Maybe it doesn't need to be...or maybe it does needs to be. What do you think? What will work for you? And do you now understand why version control is such a fantastic thing? (and for a bonus - why it works well for code, but not so well for papers)

Cheers

Cameron

ps Q: How many Word users does it take to change a lightbulb? A: That's entirely the wrong question. How many lightbulbs does it take to change a Word user?

On 7 Jan 2014, at 15:57, Carl Boettiger notifications@github.com wrote:

@gvwilson @jkitzes Great points that cut to the heart of the matter; hence my initial arguments that SWC should first focus on publication of code and data with appropriate metadata, which is a natural context to introduce plain-text-based scientific writing (and probably the experience from which many of us first realized it might make sense to do the same for manuscripts).

The reason to adopt a plain-text (version-controlled, online collaborative) workflow is the same reason software carpentry teaches everywhere else: it will save you time. Yes, it makes collaborating with Word users potentially more time consuming, while making your own writing and other collaborations less time consuming. If the arithmetic comes out in your favor and you save time, great. If not, stick to Word. (Or develop in markdown and then paste/pandoc into Word for editing and revisions). This is what I and no doubt many on the list do -- use markdown or latex for the time-saving, headache-reducing benefits it provides, and switch that to Word (or tex or Google Doc or whatever) if or when the transaction costs of collaborating become too high.

Otherwise we risk painting the false dichotomy and echoing every flame war between choices of software or programming language. I believe SWC students should simply be given the basic skills to author scientific documents on the web in plain text, and they can then choose the appropriate medium based on context.

— Reply to this email directly or view it on GitHub.

ellisonbg commented 10 years ago

Lots of great thoughts here and many people have hits some of the points that I would myself.

I figured I would fill in with a little detail on how we (the IPython team) see the IPython Notebook in relationship to writing academic papers. There are a couple of different scenarios that we are thinking about:

[quick point first: the first step should always be to create a public git/hg repo somewhere and put everything related to a paper in it. If it is not in a repo, it doesn't exist!]

  1. IPython Notebooks as computational companions

In this usage case, the IPython Notebooks are not the papers themselves, but are used to generate and document the computational aspects of a work. We imagine that a user would create GitHub repo with the data and some Notebooks and then link to those resources through nbivewer from the actual paper. This is a great starting point because you can still use any process/tools (LaTeX, Word, etc.) to write the actual paper. The only difficulty with this is that you will likely want to generate your figures using a Notebook and then incorporate them into your paper. There are a couple of routes for this: 1) use Dexy - it already has some integration with IPython Notebooks that can extract figures from a Notebook a put them into another document, 2) IPython.nbconvert - it is not hard to use IPython's nbconvert utility to export the figures in a Notebook to external files.

  1. IPython Notebook as the paper itself - "lightweight version"

If you are writing a paper for a journal that accepts LaTeX, you might be able to use IPython.nbconvert to produce your paper directly from an IPython Notebook. The Notebook/nbconvert now supports bibtex managed references and the nbconvert template system is general enough to accommodate a journals LaTeX standards, styles, classes, commands, etc. Many of the papers that I plan on writing in the near future fall into this category. There are two downsides of this: 1) if you can't submit LaTeX to the journal you are out of luck (go back to scenario 1) and 2) you are writing in Markdown, which lacks many of the features of LaTeX.

  1. IPython Notebook as the paper itself - "serious LaTeX"

In some cases you actually need many of the more advanced features of LaTeX. For me this was the case when I was writing lots of papers in Physical Review Journals. Lots of numbered equations, equations/section references, alternating single/multi-column layout. You can fake this a little bit by using IPython "Raw Cells" which are just dumped verbatim into the LaTeX. But that only goes so far. If you hit this limit, I think the current best option is to use Dexy to write the paper and pull resources from Notebooks.

I should note that there is still a lot of work to be done on IPython and Dexy to better support these work flows. We plan on making some additional changes to nbconvert to improve these usage case and I am sure than @ananelson is more than willing to improve Dexy where needed.

khinsen commented 10 years ago

@ahmadia Thanks for the example. How was it generated?

@stefanv Thanks as well... even if your slides don't work for me (Firefox 26). I see a piece of Markdown text flashing on the screen, to be rapidly replaced by an all-white page.

@wking Even more thanks - my comments are on the new issue you created.

My impression from these examples, and my own research, is that doing scientific presentations the HTML way is possible if you don't mind learning some arcane syntax and managing a build process. In other words, it's pretty much the same as doing slides using LaTeX (Beamer etc.). So, given that scientists in some domains (physics for example) end up learning LaTeX anyway for their publications, what are the advantages to be expected from using the HTML/Markdown approach, other than fancier visual effects?

khinsen commented 10 years ago

@gvwilson The "two cultures" argument holds much less for presentations than for other subjects. In the "two cultures" divide, I am clearly on the programmers' side. I need very good reasons to choose something else than Emacs plus associated command-line tools. But I am not happy with anything I have tried for doing presentations this way.

When I switched from Linux to the Mac ten years ago, I started exploring Keynote, and ended up using it for most of my presentations, swearing at it from time to time because many tasks required too much work compared to LaTeX. And I still used LaTeX for maths-heavy presentations, swearing at it from time to time because of the endless time spent to get things exactly in the place where I wanted them.

Recently I got a new Mac with MacOS X 10.9 and the new Keynote that comes with it. It can't open my first representations any more, telling me to convert them using the previous version of Keynotes - which is not available any more. I was always afraid of vendor lock-in with Keynote, but didn't expect Apple to drop compatibility even with its own earlier formats. I decided to stop using Keynote, but I have yet to find a successor.

My colleagues around me are of little help: some have the same experience and profile as I do, and come to the same conclusions. Others, with less programming experience, don't even consider anything else than PowerPoint or Keynote, but aren't happy either. I do have some ideas of how to design and implement something better, but absolutely not the resources to do it. So for now, my best option for happiness is doing as few presentations as possible ;-)

Daniel-Mietchen commented 10 years ago

@khinsen perhaps http://slidewiki.org/ may be wort a look?

khinsen commented 10 years ago

One more idea related to the "two cultures" and transmitting the "programming culture" to scientists.

There is the famous quote by Alan J. Perlis: "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." That's exactly the principle behind plain-text tools. There's a single data structure (plain text = a list of lines of characters), with lots of functions (i.e. programs) operating on them.

I don't expect to convince students by quoting a famous person, of course, but putting the specific situation (the use of plain text formats) into a wider context might be of help. The Perlis quote puts it into the context of programming, which again is probably not of much use for non-programmers, especially since it takes a lot of experience to see why Perlis was right. But perhaps we can come up with some analogy from somewhere else?

khinsen commented 10 years ago

@Daniel-Mietchen Thanks for the pointer, SlideWiki does look interesting. And it's developed specifically for education, which is a welcome change compared to systems designed for business presentations.

There are three features that make me hesitate about SlideWiki, although I have to admit that I have spent only ten minutes exploring the site.

First, everything is editable by everyone, Wiki style. That's fine for some applications but not for others. I think I'd prefer a Github-style approach, where everyone has full control over his/her personal version.

Second, everything is stored on the SlideWiki server. If the server is down, or worse, if the service disappears, all the content becomes inaccessible. Again, I'd prefer a Github-like approach where people keep personal copies on their own machines and the Web service only handles the collaborative aspect.

Third, the only way I found to download a presentation is in "SCORM format", which I have never heard of before. It's just a zip file containing HTML, CSS, and some bookkeeping files, so I could probably figure out how to work with it, but I don't have an immediate and evident solution for how to play the slides on my own computer when I have no Internet connection.

rgaiacs commented 10 years ago

@khinsen About

So, given that scientists in some domains (physics for example) end up learning LaTeX anyway for their publications, what are the advantages to be expected from using the HTML/Markdown approach, other than fancier visual effects?

SEO (Search Engine Optimization). AFAIK it's easier to someone write a crawler for HTML pages than PDF files and every researcher should hope that their works are findable.

asinclair commented 10 years ago

As someone who comes from the user side, working in clinical research and evidence synthesis, I would be looking first for efficiencies in connecting all the pieces together. The field has a number of mature, powerful tools for various parts of the research/authoring process, but they do not talk to each other (this was a topic that came up in recent conversations at the Cochrane Colloquium Symposium, exploring ways of increasing efficiency through automation). So that is an area that requires individual and collective development.

The ability to pull together pieces by different authors and output from different programs, using conversion scripts and automated clean-up, would allow for a more distributed authorship, production of the document in stages, and would also accommodate different levels of technical skill and interest. A document could have one or more coordinating authors, responsible for converting, compiling and managing the repository.

As an aside: The wedge for the introduction of writing in text/markdown may be mobile computing. The realization that I could bounce writing in progress from OSX to an iPad and then to Windows7, display it to suit me on each one, have the very same functionality, and not lose changes I had made, made a convert out of me. Citations will have to work a whole lot better than they do right now, though, for me to convert completely.