jaybee84 / ml-in-rd

Manuscript for perspective on machine learning in rare disease
Other
2 stars 1 forks source link

Revisions: [New] Figure 1 – Demonstrating the concept of combining datasets #186

Closed jaclyn-taroni closed 2 years ago

jaclyn-taroni commented 3 years ago

In the figure previously known as figure 1 (#179), we've removed a lot of the "extra" information we were trying to get across. Namely, that we're often going to need to combine multiple datasets to set up our ML experiments in rare disease contexts. We've discussed adding a new figure that covers combining datasets. This figure also might include some panels that cover supervised vs. unsupervised ML and the concepts of training and test sets that will be covered in a box (#185).

dvenprasad commented 2 years ago

All right, I took a first pass at it. I think the captioning needs some work, esp differentiating the normalized vs not normalized combined datasets.

Would the terms combined and aggregated help differentiate them better?

figure-1-combining-datasets

jaybee84 commented 2 years ago

My first thoughts:

  1. It may be easier for the reader, if we only have 2 classes (i.e. colors)
  2. The size of shapes on the PCA plots can be made smaller to show greater distance between each cluster.
dvenprasad commented 2 years ago

Re: K fold cross validation We can do two things: Represent it on an abstract level or show the classes in the training and validation sets

1) Option One: Represent it on a somewhat abstract level So I googled a whole bunch and the rectangle representation seems to be popular for showing the windows. We can do that. I tried sketching out a circular representation below: Screen Shot 2021-10-21 at 3 51 32 PM

2)Option Two: Include classes We can add an additional row as hold-out but the 10% of the entire dataset math does not add up.

Screen Shot 2021-10-21 at 3 53 31 PM

jaybee84 commented 2 years ago

@dvenprasad can you please add in the updated figure here?

dvenprasad commented 2 years ago

figure-1-combining-datasets