zoonproject / zoon

The zoon R package
Other
61 stars 13 forks source link

chains in lists & lists in chains #239

Open goldingn opened 8 years ago

goldingn commented 8 years ago

Problem

It seems plausible that people would want to compare (list) analyses with different chains of modules. E.g. to compare the following process chains:

Chain(Background(100), CrossValidate) vs. Chain(Background(100), PartitionDisc, CrossValidate)

It would therefore be nice to be able to create a list where some of the elements are chains:

w <- workflow(...,
              process = list(Chain(Background(100), CrossValidate),
                             Chain(Background(100), PartitionDisc, CrossValidate)),
              ...)

Equally, it would be nice to use a chain where some of the elements are lists:

w <- workflow(...,
              process = Chain(list(Background(100), Background(1000),Background(10000)),
                              PartitionDisc,
                              CrossValidate)),
              ...)

Examples

Neither of these currently work:

list_in_chain <- workflow(SpOcc,
                          Bioclim(extent = c(-1, 0, 51, 52)),
                          Chain(list(Background(10), Background(20)),
                                Crossvalidate),
                          RandomForest,
                          InteractiveMap)
Caught errors:
Error in do.call(processName[[p]]$func, c(list(.data = x), processName[[p]]$paras), : could not find function "Background"
...
chain_in_list <- workflow(SpOcc,
                          Bioclim(extent = c(-1, 0, 51, 52)),
                          list(Chain(Background(10), Crossvalidate),
                               Background(20)),
                          RandomForest,
                          InteractiveMap)
Caught errors:
Error in .data$df: $ operator is invalid for atomic vectors
...
timcdlucas commented 8 years ago

Yes I always was aiming for this sort of thing to work. Might be one of the things that broke when changing to the nonstandard evaluation syntax.

However, I think we only need inner Chains, not inner lists.

This one is fine

list(Chain(Background(10), Crossvalidate),
  Background(20)),

but this one obscures what the analysis is trying to do

Chain(list(Background(10), Background(20)),
  Crossvalidate),

I think it should be written (more verbosely) as

list(Chain(Background(10), Crossvalidate),
      Chain(Background(20), Crossvalidate))

This way it is clear exactly where the analysis is split and exactly what is being compared.

timcdlucas commented 6 years ago

I'm writing some malaria zoon modules and will be adding some example analyses to a paper we're aiming to submit soon. The analyses I would like to do encounter this issue.

I want to do something like


w <- workflow(list(
        Chain(
          SpOcc(species = 'Anopheles arabiensis', 
                extent =  c(40, 55, -30, -10)),
          SpOcc(species = 'Anopheles gambiae', 
                extent =  c(40, 55, -30, -10))
        ),
        Chain(
          SpOcc(species = 'Anopheles arabiensis', 
                extent =  c(40, 55, -30, -10)),
          SpOcc(species = 'Anopheles gambiae', 
                extent =  c(40, 55, -30, -10)),
          SpOcc(species = 'Anopheles gambiae', 
                extent =  c(40, 55, -30, -10))
        )
      ),
      Bioclim(extent = c(40, 55, -30, -10), resolution = 5, layers = 1:12),
      NoProcess,
      LogisticRegression,
      Chain(PrintOccurrenceMap, 
            PrintMap))

and at the moment I'm having to do

w1 <- workflow(
  Chain(
    SpOcc(species = 'Anopheles arabiensis', 
          extent =  c(40, 55, -30, -10)),
    SpOcc(species = 'Anopheles gambiae', 
          extent =  c(40, 55, -30, -10))
  ),
Bioclim(extent = c(40, 55, -30, -10), resolution = 5, layers = 1:12),
NoProcess,
LogisticRegression,
Chain(PrintOccurrenceMap, 
      PrintMap))

w2 <- workflow(
  Chain(
    SpOcc(species = 'Anopheles arabiensis', 
          extent =  c(40, 55, -30, -10)),
    SpOcc(species = 'Anopheles gambiae', 
          extent =  c(40, 55, -30, -10)),
    SpOcc(species = 'Anopheles gambiae', 
          extent =  c(40, 55, -30, -10))
  ),
Bioclim(extent = c(40, 55, -30, -10), resolution = 5, layers = 1:12),
NoProcess,
LogisticRegression,
Chain(PrintOccurrenceMap, 
      PrintMap))

I'd love to fix this myself but doubt I'll have time soon. So I'm making this an "official feature request".

I'm still happy with this statement though "However, I think we only need inner Chains, not inner lists."