reptalex / phylofactor

26 stars 9 forks source link

Error when defining colors for pf.tree #29

Closed SilasK closed 4 years ago

SilasK commented 4 years ago

I assume this is also a small internal error

pf.tree(pf,colors = factor_colors)
Error in structure(list(node = node, fill = fill, alpha = alpha, extend = extend,  : 
  object 'cols' not found
reptalex commented 4 years ago

Fixed - thanks! A work-around can use color.fcn, but I'll push the revised version shortly.

reptalex commented 4 years ago

Pushed - you can now input your own colors (will throw a right now probably ambiguous error if input colors aren't of length equal to pf$nfactors

SilasK commented 4 years ago

Thank you very much.

My goal is to emulate the output of Lefse. image I think the output of phylofactor are more robust than the LDA score.

For this I would like to summarize the effect of each factor to one value, e.g. the difference between the group mean.

I managed now to plot the factors in the color of the coefficient of the pf.model Phylofactor.pdf

However, I'm not sure what I should do if factors are overlapping. The changes are additive, right?

Should I rather take the output of predict ?

My code so far:

coefficients = vector(length = 10)
for (fofi in 1:length(coefficients))
{
  coefficients[fofi] = summary( pf$models[[fofi]])$coefficients['SourceFeces',"Estimate"]  #"Pr(>|t|)"
}

map2colors = function(values, endpoints=c("darkblue","white", "darkred"))
{

limit_value= max(abs(values))

## Use n equally spaced breaks to assign each value to n-1 equal sized bins 
ii <- cut(values, breaks = seq(-limit_value, limit_value, len = 100), 
          include.lowest = TRUE)

## Use bin indices, ii, to select color from vector of n-1 equally spaced colors
colors <- colorRampPalette(endpoints)(99)[ii]

## This call then also produces the plot below
#image(seq_along(values), 1, as.matrix(seq_along(values)), col = colors,
#      axes = F)

return(colors)
}

factor_colors= map2colors(coefficients)

pf.tree(pf,colors = factor_colors)
reptalex commented 4 years ago

I don't fully understand the underlying data & questions, but I think we can make progress without that -

You have a lot of options, and the choice is yours depending on what you're hoping to illustrate. Plotting the raw data with factors highlighted could let the data visualize itself, with clades showing that phylofactor has partitioned groups appropriately. You can also use the pf.predict output for a cleaner version, and even a side-by-side to see how phylofactor has classified vs. how the data look. However, pf.predict can't use phyca or method='max.var' objects - the predict function uses a glm format. One possibility for your own phyca.predict function is to just take the means within groups summarized (go through your pf.tree()$legend, pick the factors-x-groups, and take means, geometric, arithmetic, mode or however else you'd like to summarize/aggregate).

Does this help?