ropensci / unconf16

rOpenSci's San Francisco hackathon/unconf 2016
http://unconf16.ropensci.org
23 stars 7 forks source link

Future of maptools Task View #42

Open jhollist opened 8 years ago

jhollist commented 8 years ago

Over the last year or so I've been maintaining the https://github.com/ropensci/maptools task view. It hasn't seen much activity (until today with a few new packages added). Main reason behind this is that there were some concerns about adding it as an official CRAN Task View due to the overlap between this and the existing Spatial Task View. At this point, it is not clear if the task view has a home (even after significant editing) as a CRAN Task View.

I think there is enough unique about what we have (i.e. links to source repositories and/or links to packages not on CRAN) to keep it around. I am just wondering how best to keep this information and disseminate it. So, if others are looking for something to think on and discuss today or tomorrow, I'd be grateful for the thoughts. Feel free to add to the issue here or just ping me directly.

sckott commented 8 years ago

hey jeff - I vote for continuing to work on it, but not submitting as a CRAN task view - it's a great resource and people will find it when googling

jhollist commented 8 years ago

I agree that we should keep it going. It is a nice resource.

Is there any merit to redoing the format? Straight up markdown, as opposed to .ctv would be a bit easier to have others contribute. Could also do it as a web page, although not sure there is much benefit to that over the repo itself.

noamross commented 8 years ago

We should at minimum have links to all rOpenSci task view https://ropensci.org/packages/, no? We could possibly ingest the task views to display them on the web site, but that seem like more work that it's worth.

leeper commented 8 years ago

I think there might be something to be said about creating a GitHub-native format similar to Task Views (perhaps like the "awesome" lists), as well. The number of packages on CRAN is too big to really be able to have up-to-date Task Views that have complete coverage while also maintaining the high visibility that comes from having a small number of Task Views. It'd be much easier to create and maintain narrower Task Views on GitHub, where there might only be ~10 or so packages on a topic. But then they're not easily discoverable unless they all live in one place/GH organization. Something to think about...

sckott commented 8 years ago

where there might only be ~10 or so packages on a topic

does that mean you'd prefer to split up web tech & open data? or did you mean could have as little as 10?

leeper commented 8 years ago

I think we could do that, but more I think it would be useful to have a TV-like format that works for smaller topics (I.e., not enough for a Task View but worth having an up-to-date page about).

mbojan commented 8 years ago

If I may drop two cents:

I would definitely keep the maptools task view. It is a great resource. I think it's primary advantage over existing CRAN Task Views (CTV) and awesome lists (AL) is that it is focused. IMHO some of the CTVs are too broad (e.g. webtools) and it is even more the case with a lot of ALs.

The purpose of CRAN Task Views was to have curated lists of packages that:

  1. Have common purpose/goal.
  2. Briefly describe what the included packages do.
  3. Provide the user the possibility to install all the packages in CTV with a single function call.

As @leeper wrote, we have now so many packages for so many purposes that the principle of having a limited number of CTVs becomes impractical. R evolves and specializes: there are more and more tools addressing relatively narrow field of application. In that sense, ad (1), it would be good to CTVs to become more fine-grained. Take interacting with resources on the Web as an example. Some years ago there were only download.file, url connection, and XML package for that, and look where we are now. Keeping (2) up to date is more difficult as the View grows, unless you are able to mobilize package authors to keep their package descriptions current. Functionality (3) is I think not used by many people, and it is even less usefull if a CTV contains hundreds of packages. See:

vs <- ctv::available.views()
sapply(vs, function(v) structure(nrow(v$packagelist), names=v$name))

                Bayesian                  ChemPhys            ClinicalTrials                   Cluster 
                      115                        84                        46                       102 
    DifferentialEquations             Distributions              Econometrics            Environmetrics 
                       22                       185                       121                       109 
       ExperimentalDesign                   Finance                  Genetics                  Graphics 
                       63                       141                        31                        42 
 HighPerformanceComputing           MachineLearning            MedicalImaging              MetaAnalysis 
                       83                        80                        28                        71 
             Multivariate NaturalLanguageProcessing      NumericalMathematics        OfficialStatistics 
                      124                        36                        69                        72 
             Optimization          Pharmacokinetics             Phylogenetics             Psychometrics 
                       97                         8                        76                       133 
     ReproducibleResearch                    Robust            SocialSciences                   Spatial 
                       70                        51                        83                       142 
           SpatioTemporal                  Survival                TimeSeries           WebTechnologies 
                       58                       203                       196                       155 
                       gR 
                       36 

and that does not count dependencies of packages in a CTV.

I was also thinking about the format, because the writing CTV XML by hand is cumbersome, especially with having currently (R)Markdown around. The main purpose for the CTV format was to be able to

a. Easly display it on the web. b. Construct the package list automatically.

Both of these goals can be accomplished with RMarkdown, which would even allow to embed images, which are not allowed at this moment. Perhaps Achim Zeileis responsible for CTVs would be open to some new ideas...

sckott commented 8 years ago

@mbojan

Provide the user the possibility to install all the packages in CTV with a single function call.

Do you think people do this anymore?

Perhaps Achim Zeileis responsible for CTVs would be open to some new ideas...

I'd be surprised if they were open to Rmd, but doesn't hurt to ask - they did allow markdown vignettes in pkgs

cboettig commented 8 years ago

Seems like it should be possible to generate the ctv XML format from an RMarkdown source file, no?

leeper commented 8 years ago

@cboettig That is what we currently do for webservices and opendata. It's a simple markdown file that is pandoc-ed (+ a little bit of R-ed) into the CTV format.

jhollist commented 8 years ago

And maptools was modelled after those and uses the same approach.

I'd also be interested in what others think about how often people "install" a CTV? Would it be useful to have an install.views for CTVs on GitHub?

On Mon, May 30, 2016 at 12:59 PM, Thomas J. Leeper <notifications@github.com

wrote:

@cboettig https://github.com/cboettig That is what we currently do for webservices and opendata. It's a simple markdown file that is pandoc-ed (+ a little bit of R-ed) into the CTV format.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ropensci/unconf16/issues/42#issuecomment-222528098, or mute the thread https://github.com/notifications/unsubscribe/AFL8S5o7boIiK7cNDOVW01Keupu06FIlks5qGxd_gaJpZM4H9JNd .

Jeff W. Hollister email: jeff.w.hollister@gmail.com google voice: 401 326 2531 cell: 401 556 4087

jennybc commented 8 years ago

I didn't even know it was possible to install a task view! And the thought of it makes me queasy.

But @jhollist's idea to support installation from a CTV on GitHub seems like a clever way to define and facilitate installation of a constellation of packages, e.g. for a workshop.

mbojan commented 8 years ago

@sckott, I would be surprised if anybody install whole CTVs nowadays. While I can imagine some Pharmakinetist (?) installing 8 packages (+dependencies). I can't imagine installing 141 packages (+dependencies) even if you are a financial analyst. The Distributions CTV grouping functions modeling different families of statistical distributions is perhaps useful as a reference, but nobody will probably ever need to install and use all, even if you are a Bayesian fundamentalist. Anyway, I don't think there is any data on CTV usage unfortunately, from CRAN web stats or otherwise.

I was playing around with a Rmd document with some dedicated YAML fields and |package_name| syntax for package names, A piece of R code converts it to a .ctv file collecting the package names from ||. I did not go very far implementing it though.

Github-hosted, RMarkdown-based TaskView that could trigger installation of packages would be something useful I think. Indeed quite useful in the context of workshops etc. (I did not think of that!). Definitely more lightweight than creating a "metapackage" with the wanted packages as dependencies, and more "secure" than sourcing R script from the web that might contain system("rm ~") or some similar kind of joke.

leeper commented 8 years ago

If you want to install all packages from a task view, then what you really want is for the task view itself to be a package with all of its listed packages as dependencies, right?

mbojan commented 8 years ago

@leeper I guess you could just create a package with the task view document as a vignette, and have all the packages listed in Suggests field in DESCRIPTION. Then

  1. The user can install the package to consume the vignette (this will not install "suggested" packages)
  2. The user can install the package with install.packages(..., dependencies=TRUE) to have the suggested packages installed rightaway.
  3. The user can do (1) and later call some provided function that will install the suggested packages retireving their names with packageDescription.

The view mechanism seems lighter, which I like. Just put a CTV-like file in a GH. Read it on GH. Install the packages by calling something like install_view("leeper/myview") that would fetch the file, get the package names and install them. That might even allow the view to include packages that are not on CRAN, but on GH etc. I think that actually might be very nice for @jennybc case with workshop-like setups. We might also have something like update_view() that would just update the view-related packages.

As the view in principle does not contain any R code, wrapping it as an R package seems a bit of an overkill to me...

leeper commented 8 years ago

A CRAN-compliant package can be just a DESCRIPTION file and a (possibly empty) NAMESPACE file. Even if it contains no code, documentation, tests, or vignettes, it is still installable and therefore provides a mechanism for Depends/Imports/Suggests installations via install.packages(), install_github(), etc. without needing to write a new package to handle that.

mbojan commented 8 years ago

@leeper sure, that's what I meant in the "package solution" above. In fact, there are such "metapackages" on CRAN, e.g. statnet.

Another functionality available with CTVs which would be difficult to mimic with a package is that you can mark some of the packages as "core". When installing the view you could choose whether to install all or just the core. With a package and existing functions it is all or nothing.

But I guess we are somewhat drifting away from the original topic of this thread...