helgasoft / echarty

Minimal R/Shiny Interface to ECharts.js
https://helgasoft.github.io/echarty/
88 stars 3 forks source link

Possible features: Including more information #36

Closed MrMisc closed 8 months ago

MrMisc commented 9 months ago

Hi, I was wondering if it were possible to include more information on say a scatterplot for a time-series dataset. I have 2 specific questions in mind.


df_<-read.csv("example.csv")
df_ %>% mutate(value = round(value,1))%>%
  group_by(Zone) |> 
  ec.init(
    title= list(text= 'Temporal Trends: Contamination/Infection/Colonization Rates Across Hosts, Eggs, and Faeces '),
    xAxis = list(name = 'Time',nameLocation = 'start',
                 nameTextStyle = list(fontWeight ='bolder'),
                 axisLabel = list(rotate = 346,width = 65,
                                  overflow = 'truncate')),
    yAxis = list(max = 100,name = "% compromised",nameLocation = 'start',
                 nameTextStyle = list(fontWeight ='bolder')),
    dataZoom= list(list(type= 'slider',orient = 'vertical'
                   ,left = '2%'),list(type= 'slider',orient = 'horizontal'
                                      ,right = '2%',top='1%', width = '20%')),
    tl.series = list(type  ='line',
                     encode = list(x = 'TimeUnit',y = 'value'), groupBy= 'variable',
                     emphasis= list(focus= 'series',
                                    itemStyle=list(shadowBlur=10,
                                                   shadowColor='rgba(0,0,0,0.5)'),
                                    label= list(position= 'right',
                                                rotate = 350,
                                                show=TRUE))),
    tooltip = list(show = T, trigger = 'axis'))|>
  ec.upd({legend<-setting
  options <- lapply(seq_along(options), \(i) {  
    tita<-title
    tita$text <- paste(tita$text, options[[i]]$title$text)
    options[[i]]$title <- tita   # here we set a title for each timeline step    
    options[[i]]$legend$data <- cns[[i]]  # fine-tune legends: data by continent
    options[[i]] 
  })
  })

There is additional information like Total Hosts, No contaminated, No infected etc that are not currently reflected in my graph. I have been wondering if there would be a way to incorporate that information to give more context to the percentage values that are reflected on the graph. This is because sometimes a rise in percentage of hosts infected from hour X to hour X+1 say in Zone 2 , could be due to the number of hosts having decreased instead of more infected hosts having surfaced. The same goes for contaminated hosts.

The same applies for faeces and eggs.

  1. How would I incorporate say, "Total Hosts" as the size of the points in all of the lines?
  2. Is it possible to incorporate "Total Hosts" for the %Contaminated, %Infected and % Colonized lines (which are all for hosts), and "Total faeces" for the %faeces line and "Total eggs" for the %eggs line? Is that possible?

I did more a ugly and not so wieldable version in plotly a while back and was wondering if there was something similar I could do with echarty.

I am open to suggestions on another way to present this additional information in say another graph in a side by side comparison or something if you believe there is such a method. I understand if this issue is a bit too troublesome, thank you for your consideration and time.

helgasoft commented 9 months ago
  1. How would I incorporate say, "Total Hosts" as the size of the points in all of the lines?

yes, easy - symbolSize= ec.clmn('Total.Hosts', scale=0.01). But Total.Hosts values are not very suitable for symbol sizes.

  1. Is it possible to incorporate "Total Hosts" for the %Contaminated, %Infected and % Colonized lines (which are all for hosts), and "Total faeces" for the %faeces line and "Total eggs" for the %eggs line?

yes again, but with some JS code. However there are no "Total faeces" or "Total eggs" columns in the CSV data...

symbolSize= ec.clmn("function(v,pp) {
if (['%Contaminated','%Infected','%Colonized'].includes(pp.seriesName))
  return pp.data[1]*0.01;   // pp.data[1] is Total.Hosts
else
  return v[13];}  //  v[13] is the value, could be a constant like 5
")

image

MrMisc commented 9 months ago

Thank you so much! I misnamed the columns, my apologies. I was referring to "Eggs Amt" and "Faeces Amt".

Could I get clarification as to where I should be placing the symbolsize javascript you kindly showed though? It is such that it is sensitive to the frame(zone) that we are in correct?

I went with the following but I think the circles should logically be large in Zone 0 at the start of time since that is where all the hosts are at the beginning before they are migrated to the following zones.

df %>% mutate(value = round(value,1))%>%
  group_by(Zone) |> 
  ec.init(
    title= list(text= 'Temporal Trends: Contamination/Infection/Colonization Rates Across Hosts, Eggs, and Faeces '),
    xAxis = list(name = 'Time',nameLocation = 'start',
                 nameTextStyle = list(fontWeight ='bolder'),
                 axisLabel = list(rotate = 346,width = 65,
                                  overflow = 'truncate')),
    yAxis = list(max = 100,name = "% compromised",nameLocation = 'start',
                 nameTextStyle = list(fontWeight ='bolder')),
    dataZoom= list(list(type= 'slider',orient = 'vertical'
                   ,left = '2%'),list(type= 'slider',orient = 'horizontal'
                                      ,right = '2%',top='1%', width = '20%')),
    tl.series = list(type  ='line',
                     encode = list(x = 'TimeUnit',y = 'value'), groupBy= 'variable',
                     symbolSize= ec.clmn("function(v,pp) {
          if (['%Contaminated','%Infected','%Colonized'].includes(pp.seriesName))
            return pp.data[1]*0.051;   // pp.data[1] is Total.Hosts
          else
            return v[13];}  //  v[13] is the value, could be a constant like 5
          "),
                     emphasis= list(focus= 'series',
                                    itemStyle=list(shadowBlur=10,
                                                   shadowColor='rgba(0,0,0,0.5)'),
                                    label= list(position= 'right',
                                                rotate = 350,
                                                show=TRUE))),
    tooltip = list(show = T, trigger = 'axis'))|>
  ec.upd({legend<-setting
  options <- lapply(seq_along(options), \(i) {  
    tita<-title
    tita$text <- paste(tita$text, options[[i]]$title$text)
    options[[i]]$title <- tita   # here we set a title for each timeline step    
    options[[i]]$legend$data <- cns[[i]]  # fine-tune legends: data by continent
    options[[i]] 
  })
  })
helgasoft commented 9 months ago

Please replace max=100 in yAxis with scale=T for better looking chart.

clarification as to where I should be placing the symbolsize javascript you kindly showed though?

yes, it should be in the tl.series .

It (symbolsize) is such that it is sensitive to the frame(zone) that we are in, correct?

no, all attributes in tl.series are common to all options in the timeline (Zones here).

the circles should logically be large in Zone 0 at the start of time since...

you need to look at your data df_ |> dplyr::count(Zone,Total.Hosts)

MrMisc commented 9 months ago

I appear to have some sensible circle sizes for total hosts if I go up by one index.

df %>% mutate(value = round(value,1))%>%
  group_by(Zone) |> 
  ec.init(
    title= list(text= 'Temporal Trends: Contamination/Infection/Colonization Rates Across Hosts, Eggs, and Faeces '),
    xAxis = list(name = 'Time',nameLocation = 'start',
                 nameTextStyle = list(fontWeight ='bolder'),
                 axisLabel = list(rotate = 346,width = 65,
                                  overflow = 'truncate')),
    yAxis = list(max = 100,name = "% compromised",nameLocation = 'start',
                 nameTextStyle = list(fontWeight ='bolder')),
    dataZoom= list(list(type= 'slider',orient = 'vertical'
                   ,left = '2%'),list(type= 'slider',orient = 'horizontal'
                                      ,right = '2%',top='1%', width = '20%')),
    tl.series = list(type  ='line',
                     encode = list(x = 'TimeUnit',y = 'value'), groupBy= 'variable',
                     symbolSize= ec.clmn("function(v,pp) {
          if (['%Contaminated','%Infected','%Colonized'].includes(pp.seriesName))
            return pp.data[2]*0.01;   // pp.data[1] is Total.Hosts
          else if (['%Eggs Infected'].includes(pp.seriesName))
            return pp.data[6]*0.01;
          else if (['%Faeces infected'].includes(pp.seriesName))
            return pp.data[9]*0.01;
          else
            return 1;}  //  v[13] is the value, could be a constant like 5
          "),
                     emphasis= list(focus= 'series',
                                    itemStyle=list(shadowBlur=10,
                                                   shadowColor='rgba(0,0,0,0.5)'),
                                    label= list(position= 'right',
                                                rotate = 350,
                                                show=TRUE))),
    tooltip = list(show = T, trigger = 'axis'))|>
  ec.upd({legend<-setting
  options <- lapply(seq_along(options), \(i) {  
    tita<-title
    tita$text <- paste(tita$text, options[[i]]$title$text)
    options[[i]]$title <- tita   # here we set a title for each timeline step    
    options[[i]]$legend$data <- cns[[i]]  # fine-tune legends: data by continent
    options[[i]] 
  })
  })

Which produced the following result for Zone 0 that makes sense to me.

rstudio_aHYbBRgGFL

The total hosts, faeces amt and eggs amt between zones are as follows respectively

Hosts

rstudio_ICupoIUdnx

Faeces

rstudio_InqKA3Kfkn

Eggs

rstudio_4gRSKmLEsD

Faeces and Eggs amt might not be being selected

If we look at the collection zone frame however..

rstudio_q0zQXjGpte

Given the large number of faeces in the collection zone at the end, however, leads me to believe that the faeces and eggs columns are somehow not being selected since in the collection zone frame, there is no visible sign of the faeces amounts being large inside the collection zone frame. (but the hosts seem to be popping up fine I think - since the size appears to be roughly the same across frames(zones) which is reflected in the simpler plots.

Also doesn't make sense that the points could be too small to see since the amount of faeces collected becomes much larger than 2000 hosts. The problem is likely something to do with how my dataset is being indexed. I don't really know which column is being selected, and I have just been going off of the order of columns in names(df). Adding 1 to it appears to correctly call the Total Hosts column that I wanted, but not the Faeces Amt and Eggs Amt columns.

helgasoft commented 9 months ago

Instead of coding it with (unfamiliar) Javascript, one can build symbolSize as data column with mutate. Then just reference the data column in symbolSize.

library(dplyr); library(echarty)
setting <- list(show = T,type= "scroll",orient= "horizontal", pageButtonPosition= 'start',
                right= 5,top = 30, icon = 'circle', align= 'right', height='85%')
df <- read.csv("example.csv")

# build symbol sizes in df
df <- df |> rowwise() |> mutate(value= round(value,1),
    ssize= 
    ifelse(variable %in% c('%Contaminated','%Infected','%Colonized'), 
             max(Total.Hosts*0.01, 1),
    ifelse(variable =='%Eggs Infected', max(Eggs.Amt*0.01, 1),
    ifelse(variable =='%Faeces Infected', max(Faeces.Amt*0.01, 1), 2))) 
) |> ungroup()

df %>% group_by(Zone) |> 
ec.init(
    title= list(text= 'Temporal Trends: Contamination/Infection/Colonization Rates Across Hosts, Eggs, and Faeces '),
    xAxis = list(name = 'Time',nameLocation = 'start', scale=T,
        nameTextStyle = list(fontWeight ='bolder'),
        axisLabel = list(rotate = 346,width = 65, overflow = 'truncate')),
    yAxis = list(name= "% compromised",nameLocation = 'start', scale=T,
                 nameTextStyle = list(fontWeight ='bolder')),
    timeline= list(show=T),
    series.param=  list(type= 'line',
         encode= list(x= 'TimeUnit', y= 'value'), groupBy= 'variable',
         symbolSize= ec.clmn('ssize'),
         emphasis= list(focus= 'series',
            itemStyle=list(shadowBlur=10,
                           shadowColor='rgba(0,0,0,0.5)'),
            label= list(position= 'right', rotate= 350, show=TRUE))),
    tooltip = list(show= T, trigger= 'axis'), legend=setting
)

image