wvictor14 / team_Methylation-Badassays

STAT 540 Spring 2017 team repository
0 stars 2 forks source link

Project progress report for team_Methylation-Badassays #5

Open nivretta opened 7 years ago

nivretta commented 7 years ago

@rbalshaw @farnushfarhadi

farnushfarhadi commented 7 years ago

Hi @STAT540-UBC/team-badassays

Thank you for submitting the progress report.

A few comments:

@rbalshaw your thoughtful comments are highly welcomed.

Good luck team! :)

rbalshaw commented 7 years ago

It looks like you have made good progress getting your data into R, reviewing it for quality, and conducting normalization, etc.

Your plan, laid out in S.1.2 looks pretty solid. A regularized logistic regression model seems a sensible thing to try. Cross-validation as you describe (and as packages like caret should make fairly straight-forward) will help you to understand the performance of the model for identifying the Caucasian vs. Asian samples and help reduce overfitting.

You next plan (step 3) to do unsupervised analyses of these data (PCA) and hope to see that some of the PCs are associated with self-reported ethnicity in the training data. This is a sensible idea, but I tend to think of this type of analysis as a precursor to the logistic regression (a supervised technique). Not a big deal, though. Plotting these PC values for the test data -- where you cannot confirm the ethnicity -- will be very interesting.

I would suggest that you could also do a PCA using only the features selected by the regularized logistic regression. This plot will almost certainly show some differences between the ethnicities in the training data (you should think about this and make sure it's clear why this is so) -- and if you are lucky, and your hypothesis is valid, you may see similar structures when you plot these PCs for the test data.

You have a bit of a hurdle to clear with getting your processed data back into R - but that seems something that we might be able to look at over the phone and with screen sharing (Webex or Skype?)

Please let me know if anyone on the team would like to chat. Best would be to contact me by email: robert.balshaw@bccdc.ca

farnushfarhadi commented 7 years ago

Hey @STAT540-UBC/team-badassays

Please make sure most of your team members are coming to seminar tomorrow :) Rob will be there as well! We can discuss things in your project together.