statist7 / LMSgrowth2

MIT License
2 stars 2 forks source link

Add plots in multiple measurements tab #41

Closed giordano closed 4 years ago

giordano commented 4 years ago

Preview:

image

The data in the table can be filtered and the plots will show only the filtered rows in the table.

This pull request is based on #40.

statist7 commented 4 years ago

Sorry for slow response - and Happy New Year!

The new plots look very good. I think the data are from the Cambridge Infant Growth Study, but I get an error about embedded nulls when I try to read the 'CIGS data.xlsx' file, which I sent on 2 April last . Did you edit the file first, and if so how? I'm not clear which file formats the multiple tab can handle.

statist7 commented 4 years ago

I note that age is in units of days in the sidebar but in years in the plots... Also the plot scales are not labelled, and two of the x axis lines are missing. It's probably better to label the y scale than use the legend on the right, and age needs just one x scale at the bottom.

statist7 commented 4 years ago

Please can you point me to a file I can use to test this?

giordano commented 4 years ago

Hello and Happy New Year to you as well!

Please can you point me to a file I can use to test this?

I'm using the CSV file resources/data.csv. This Shiny app doesn't support XLS(X) files.

statist7 commented 4 years ago

sorry to be dense, but how do I select that URL in the Multiple tab?

giordano commented 4 years ago

The file is in your local copy of the repository, in the resources directory.

statist7 commented 4 years ago

Aaah! thank you

giordano commented 4 years ago

Updated previews: when all children are selected, with both sexes. Each colour is a different ID, in this case the "ID" column image

If you filter the children using the filters in the table and you have all children of the same sex (in this example I directly filtered all females in the "Sex" column), centiles are automatically plot under the datapoints: image

You can choose a different column as ID, for example "Sex", so that the colour would indicate the sex of the children: image

statist7 commented 4 years ago

Looking good, and linking the centiles to sex works well.

I'm having trouble with the age axis. If I change the variable from days to age (i.e. age in years), and at the same time change the units from days to years, it takes a long time to replot with the correct x-axis. So it's possible to have numbers 1 to 6 on the axis but labelled days, or numbers corresponding to days but labelled years. Is it just very slow, or is there something wrong? During the process it gives a series of warnings: Ignoring 274 observations or similar.

Related to this, I can't get plotly to do any of the options above each plot, for example box select or lasso select. It puts the dotted line around the region but doesn't do anything with it. Any idea why?

It would be good to have the option to join up points for individuals. I'm not sure though how it would know that the grouping variable was genuinely the ID rather than say sex or some other variable. Do you have thoughts on how to do it?

Also, would it be possible to add ID to the plotly hover display box, which at the moment just has x and y?

statist7 commented 4 years ago

Looks good - I like the plotting options, and they could be extended to handle other aspects of the plot - e.g. number and spacing of centiles, colour/size/type of data points and centile lines.

statist7 commented 4 years ago

... and even the form of the plot itself, i.e. the y axis as measurement, centile or z-score as we have discussed.

giordano commented 4 years ago

I'm having trouble with the age axis. If I change the variable from days to age (i.e. age in years), and at the same time change the units from days to years, it takes a long time to replot with the correct x-axis. So it's possible to have numbers 1 to 6 on the axis but labelled days, or numbers corresponding to days but labelled years. Is it just very slow, or is there something wrong? During the process it gives a series of warnings: Ignoring 274 observations or similar.

It's rather slow for a couple of reasons: the database is very large (almost 5000 data points for 269 different individuals), and we're doing some computational intensive processing of the data before plotting. I think Asif is working on improving the user experience.

However, note that parents would likely work with much smaller datasets, probably several measurements (in the order of the tens?) of a single individual. For example, in the resources/ directory I've added a file called short-data.csv which is an excerpt of the full data.csv files, which has 202 datapoints for 10 individuals. Plotting the data of this file should be reasonably quick.

Related to this, I can't get plotly to do any of the options above each plot, for example box select or lasso select. It puts the dotted line around the region but doesn't do anything with it. Any idea why?

Honestly it's not clear to me what those two options are supposed to do.

It would be good to have the option to join up points for individuals. I'm not sure though how it would know that the grouping variable was genuinely the ID rather than say sex or some other variable. Do you have thoughts on how to do it?

Also, would it be possible to add ID to the plotly hover display box, which at the moment just has x and y?

I think I already addressed these points with my changes yesterday, except in the last commit yesterday there was an extra comma that upset shiny. The plotting options need to improved and we'll probably add more options, see https://github.com/UCL/LMSgrowth2/issues/7#issuecomment-579170048.

giordano commented 4 years ago

Looks good - I like the plotting options, and they could be extended to handle other aspects of the plot - e.g. number and spacing of centiles, colour/size/type of data points and centile lines.

Regarding centile options, this is probably not clear, but they're controlled in the "Centile" tab: changes there are automatically propagated also in the centiles shown in the "Multiple" tab.

... and even the form of the plot itself, i.e. the y axis as measurement, centile or z-score as we have discussed.

Yes, this is in our TODO list in https://github.com/UCL/LMSgrowth2/issues/7#issuecomment-579170048.

statist7 commented 4 years ago

Thanks for the shorter file - it's much more responsive.

Related to this, I can't get plotly to do any of the options above each plot, for example box select or lasso select. It puts the dotted line around the region but doesn't do anything with it. Any idea why?

Honestly it's not clear to me what those two options are supposed to do.

I think it's meant to rescale the plot based on the selected points. The selection bit seems to work but not the rescale bit.

Regarding centile options, this is probably not clear, but they're controlled in the "Centile" tab:

Yes I was aware of this, but those options would fit well together with your new ones.

giordano commented 4 years ago

I think it's meant to rescale the plot based on the selected points. The selection bit seems to work but not the rescale bit.

My understanding is that lasso and box selections offer a way to interact with the data: you can select a region and execute some actions on the selected data points, see for example https://plot.ly/javascript/lasso-selection/. For this to happen, however, we need to write some specific code, but we also need to decide what kind of actions we may want to execute on the selected points. I currently don't see any use for this. We can see if we can just hide those buttons if we don't find any use for them.

For the purpose of zooming in, you can use the "Zoom" button (the one with the icon of a magnifying glass), which should be the default active button when a plot is shown.

statist7 commented 4 years ago

That's quite a powerful example. The one obvious use for it I can see is to select a region of one plot and then zoom in to display just the selected points in all the separate plots.

giordano commented 4 years ago

That's quite a powerful example. The one obvious use for it I can see is to select a region of one plot and then zoom in to display just the selected points in all the separate plots.

This should happen automatically by using the standard zooming functionalities without writing special code for it if all charts were plotted together, sharing the same x-axis. This should be doable, I played a little bit last week but I had some problems, I'll hopefully come back on this later. I don't think adding this feature to lasso and box selections is worth the effort.

During the process it gives a series of warnings: Ignoring 274 observations or similar.

I had a look at these warnings: they are shown because in the full dataset there are 274 missing measurements, marked as NA in the CSV file, for the head circumference and 277 missing measurements for the height. Nothing worrying, and it's good that plotly and DT can gracefully handle this situation without problems.

giordano commented 4 years ago

New update.

Plotting of measurements (what we already had so far): image

centiles: image

and SDSs: image

statist7 commented 4 years ago

I played with the plot selection and generated lots of messages:

Warning in RColorBrewer::brewer.pal(N, "Set2") : n too large, allowed maximum for palette Set2 is 8 Returning the palette you asked for with that many colors

Please can you explain what the subplots are?

giordano commented 4 years ago

I played with the plot selection and generated lots of messages:

I know. I need to investigate more, but my understanding is that this is a bug in plotly. If so, there is little we can do here and we have to keep the warning.

There is also another warning regarding resizing the subplot using the function layout being deprecated. This is a bug in the R interface to plotly, there is a comment in the code pointing to the issue on GitHub.

Please can you explain what the subplots are?

This is a feature of plotly that allows merging multiple plots into a single object, see for example https://plot.ly/r/subplots/. This is what allows having a shared x-axis for all plots.

statist7 commented 4 years ago

I've just updated packrat, and see that the sitar library is still 1.1.1 so it doesn't have the new growth references. Could sitar be pushed to CRAN 1.1.2 and the new references added?

In terms of the names for the tabs, I suggest 'One child' for the Calculator and 'Multiple children' for Multiple. Then Centiles rather than Centile, lose the Example tab, and rename Density as 'LMS density'.