rOpenGov / ropengov.github.io

rOpenGov
17 stars 1 forks source link

Get tutorial vignettes from projects #9

Closed jlehtoma closed 10 years ago

jlehtoma commented 10 years ago

If rOpenGov package has a GitHub-repo, it is listed in the project-specific md-files in the _projects subdir. Rake task tutorial:add could extract GH-urls from the md-files, get the vignettes, and convert them to tutorial pages.

It would be good if there was a way to detect whether a package has a tutorial vignette or not. This would save the trouble of download/extract if there is no tutorial vignette. Maybe an extra entry

tutorial: package_name_tutorial.md

could be placed in the project-md YAML front matter. The name could then be used to place the relevant tutorial path in the projects table as well.

antagomir commented 10 years ago

The existence of a vignette could probably be checked directly from the GH master branch assuming all packages have the same structure, and the vignette is located at /vignettes/packagename_tutorial.Rmd

For instance the Sotkanet vignette, if available would be located in: https://github.com/rOpenGov/sotkanet/blob/master/vignettes/sotkanet_tutorial.Rmd (right now it is at https://github.com/rOpenGov/sotkanet/blob/master/pkg/vignettes/vignette.Rmdbut I will soon change this to follow our new guidelines)

It would be simpler if the tutorial script can check whether the standard vignette file is in GH master, then decide based on this whether to download the package tarball or not. Then there would be no need to add extra fields to the package tables at the website. This would require less maintenance, hence easier to keep up-to-date.

On Sun, Jan 12, 2014 at 11:27 AM, Joona Lehtomäki notifications@github.comwrote:

If rOpenGov package has a GitHub-repo, it is listed in the project-specific md-files in the _projects http://bit.ly/1eyigpIsubdir. Rake task tutorial:add could extract GH-urls from the md-files, get the vignettes, and convert them to tutorial pages.

It would be good if there was a way to detect whether a package has a tutorial vignette or not. This would save the trouble of download/extract if there is no tutorial vignette. Maybe an extra entry

tutorial: package_name_tutorial.md

could be placed in the project-md YAML front matter. This path could then be used to place the relevant tutorial path in the projects tablehttp://ropengov.github.io/projects/as well.

— Reply to this email directly or view it on GitHubhttps://github.com/rOpenGov/ropengov.github.io/issues/9 .

jlehtoma commented 10 years ago

OK, let's check check the URL then.

With the projects table I meant something along the lines what rOpenSci has, i.e. listing the available tutorials in a single table. We could 1) add the links to tutorials to the existing projects table, or 2) create a separate tutorials table.

antagomir commented 10 years ago

Oh yes, that would be very useful. Should be automated as far as possible of course. That way, perhaps it would be possible to implement both 1 and

  1. But for simplicity, my vote goes to option 1 unless there are clear arguments favoring a separate tutorials table.

On Sun, Jan 12, 2014 at 11:58 AM, Joona Lehtomäki notifications@github.comwrote:

OK, let's check check the URL then.

With the projects table I meant something along the lines what rOpenSci has http://ropensci.org/tutorials/index.html, i.e. listing the available tutorials in a single table. We could 1) add the links to tutorials to the existing projects table, or 2) create a separate tutorials table.

— Reply to this email directly or view it on GitHubhttps://github.com/rOpenGov/ropengov.github.io/issues/9#issuecomment-32119926 .

jlehtoma commented 10 years ago

Jekyll can take care of the automated generation. Since the projects table is generated from the YAML front matter in the _projects/*.md-files (actually they have no content beyond the front matter), it would be cleaner to have something in the front matter indicating whether a tutorial exists (my initial suggestion).

Let's go for option 1 for now.

antagomir commented 10 years ago

So did I understand correctly that the tutorial information can be (easily) automatically filled in the YAML front matter by the scripts when the tutorial is available?

On Sun, Jan 12, 2014 at 12:16 PM, Joona Lehtomäki notifications@github.comwrote:

Jekyll can take care of the automated generation. Since the projects table is generated from the YAML front matter in the _projects/*.md-files (actually they have no content beyond the front matter), it would be cleaner to have something in the front matter indicating whether a tutorial exists (my initial suggestion).

Let's go for option 1 for now.

— Reply to this email directly or view it on GitHubhttps://github.com/rOpenGov/ropengov.github.io/issues/9#issuecomment-32120192 .

jlehtoma commented 10 years ago

Project-files in _projects are not automatically generated, although this could be done of course. Building the table is done automatically based on the content of _projects. I'll have a stab at this and see how it goes.

antagomir commented 10 years ago

The only thing I'm worried about is that if/when the project scales up and independent developers join in, it may become a burden to keep these files up-to-date. Optimally, for scalability, everything would be automatically fetched from the package directories and the package authors should only need to care about their package. And we won't need to keep track on changes. But well, not sure if complete automation is realistic. Thanks for having a look.

On Sun, Jan 12, 2014 at 1:11 PM, Joona Lehtomäki notifications@github.comwrote:

Project-files in _projects are not automatically generated, although this could be done of course. Building the table is done automatically based on the content of _projects. I'll have a stab at this and see how it goes.

— Reply to this email directly or view it on GitHubhttps://github.com/rOpenGov/ropengov.github.io/issues/9#issuecomment-32121048 .

jlehtoma commented 10 years ago

Running

rake projects:list

now produces following output

Looking for package tutorials in GitHub...
  No tutorial found for rSCB
  No tutorial found for osmar
  No tutorial found for Grazwahl2012
  No tutorial found for rustfare
  No tutorial found for govdat
  Tutorial found for sorvi
  No tutorial found for SmarterPoland
  No tutorial found for replicaX
  No tutorial found for sotkanet
  No tutorial found for helsinki
  No tutorial found for statfi

TITLE         | DESCRIPTION                                    | TUTORIAL
--------------|------------------------------------------------|---------
rSCB          | Statistics Sweden (SCB) R tools                | No      
osmar         | OpenStreetMap tools                            | No      
Grazwahl2012  | Austria elections                              | No      
rustfare      | Russian open welfare data                      | No      
govdat        | US Government Data                             | No      
sorvi         | Finnish open government data                   | Yes     
SmarterPoland | Poland state data                              | No      
replicaX      | Data anonymization tools                       | No      
sotkanet      | Sotkanet Finland demographic indicator R tools | No      
helsinki      | Helsinki open data R tools                     | No      
statfi        | Statistics Finland (StatFi) database R tools   | No 

i.e. the task first checks if a file can found following URL-pattern

"https://github.com/rOpenGov/#{package}/blob/master/vignettes/vignette.Rmd"

and then prints out a complete list of projects + whether a tutorial was found.

antagomir commented 10 years ago

Great ! I will try to fix the vignettes for some other packages soon and let's see how this starts rolling.

antagomir commented 10 years ago

The existence of /vignettes/package_tutorial.Rmd does not guarantee the existence of the actual vignette. This is stored in /inst/doc/ package_tutorial.md file. This is the standard location for final vignettes in R packages.

Hence my suggestion is to change the URL search pattern into: https://github.com/rOpenGov/#{package}/blob/master/inst/doc/vignette.md

By the way, should we have above "vignette.md" or "_tutorial.md"? I have previously used the former, but I understood we are changing to the latter one.

jlehtoma commented 10 years ago

The existence of /vignettes/package_tutorial.Rmd does not guarantee the existence of the actual vignette. This is stored in /inst/doc/ package_tutorial.md file. This is the standard location for final vignettes in R packages.

According to "Writing R Extensions":

A special case is PDF documents with sources in Sweave, which we call package vignettes. (Since R 3.0.0, other vignette formats are supported; see Non-Sweave vignettes.) The preferred location for the sources is the subdirectory vignettes of the source package, but pro tem for compatibility with the layout before R 2.14.0, vignette sources will be looked for in inst/doc if subdirectory vignettes does not exist. Note that the location of the vignette sources only affects R CMD build and R CMD check: the tarball built by R CMD build includes in inst/doc the components intended to be installed

So both will work, but let's go for inst/doc/package_vignette.Rmd.

Hence my suggestion is to change the URL search pattern into: https://github.com/rOpenGov/#{package}/blob/master/inst/doc/vignette.md

Why md-file? md-files are knitted from Rmd-files so shouldn't we go straight to the source?

By the way, should we have above "vignette.md" or "_tutorial.md"? I have previously used the former, but I understood we are changing to the latter one.

When using the full URL it doesn't make much difference, but maybe packagename_tutorial would be clearer.

antagomir commented 10 years ago

I just tested this and by default vignette sources from /vignettes/ seem to be converted into final vignettes located in /inst/doc during package build.

Hence we should look for either /vignettes/vignette.Rmd or /inst/doc/vignette.md

Now with a second thought I agree that using Rmd might be better, since the .md/.html/.pdf file is produced during package build (depending on vignette engine) and perhaps we should not assume that all authors push these to github sources. So let's look for /vignettes/vignette.Rmd

I am ok with either vignette.Rmd or _tutorial.Rmd but perhaps the latter would indeed be more clear.

We could also consider having separate online tutorials for stable release (CRAN or Github) and devel (Github) versions. Most people still install from CRAN, if the package is in CRAN, but may still be browsing our tutorial pages that might get out-of-sync with new updates. Not sure if this really becomes a problem so perhaps no need for changes right now.

jlehtoma commented 10 years ago

Ok, I had understood that vignette sources go to /vignettes/ and final vignettes go to /inst/doc/

In any case we're interested in the sources as the rake task will do the actual conversion using knitr.

So let's just keep the vignettes in /vignettes/_tutorial.Rmd for clarity?

OK.

Now with a second thought I agree that using Rmd better, assuming authors only push polished stuff in master.

To continue with this a little, the good side of doing the knitting and going for the Rmd-files is that everything must be in sync with the current state of the package. Using md-files on the other hand would save us the trouble of doing the knitting and the requirement of having the package actually installed, but risks the tutorial running out of sync with the actual package. Knitting the vignettes can be part of the build, so requiring a CHECK/BUILD before each push to master would guarantee a up-to-data md-file.

antagomir commented 10 years ago

Requiring check/build before master pushes would be a very good practice anyway, so I think we should add this to package guidelines.

On Mon, Jan 13, 2014 at 12:10 PM, Joona Lehtomäki notifications@github.comwrote:

Ok, I had understood that vignette sources go to /vignettes/ and final vignettes go to /inst/doc/

In any case we're interested in the sources as the rake task will do the actual conversion using knitr.

So let's just keep the vignettes in /vignettes/_tutorial.Rmd for clarity?

OK.

Now with a second thought I agree that using Rmd better, assuming authors only push polished stuff in master.

To continue with this a little, the good side of doing the knitting and going for the Rmd-files is that everything must be in sync with the current state of the package. Using md-files on the other hand would save us the trouble of doing the knitting and the requirement of having the package actually installed, but risks the tutorial running out of sync with the actual package. Knitting the vignettes can be part of the build, so requiring a CHECK/BUILD before each push to master would guarantee a up-to-data md-file.

— Reply to this email directly or view it on GitHubhttps://github.com/rOpenGov/ropengov.github.io/issues/9#issuecomment-32160512 .

jlehtoma commented 10 years ago

Tutorial adding now works from 74c06134a4a97da9fd50219c78f5286cc067f4f9 onwards, testing appreciated.