mdetels1 / qbb2023-answers

0 stars 0 forks source link

Week 6 Feedback #6

Open dtaylo95 opened 10 months ago

dtaylo95 commented 10 months ago

README.md with commands and analyses

1/2

Exercise Points Possible Grade
Commands for Step 1.1 0.33 0.33
Commands for Step 2.1 0.33 0.33
Commands for Step 3.1 0.33 0.33
Answer to Step 3.4 1 0

Missing your answer to 3.4.

plotting.py script to produce plots

3/4

Exercise Points Possible Grade
Code to produce step 1.2 PC plot 1 1
Code to produce step 2.2 AFS plot 1 1
Code to produce step 3.2 Manhattan plots 1 0.75
Code to produce step 3.3 effect size boxplot 1 0.25

While the plot itself is missing, the code for your Manhattan plot looks good. Very minor issue, but it looks like you're plotting ALL of your associations in your manhattan plots, rather than just the genotype associations. To clarify: when you run your GWAS, you include the top PCs as covariates in the regression (this is correct). But this means that you also get regression results for the covariates, not just the variants you're testing. Take a look at the TEST column in the .assoc.linear output file(s) of the plink --linear command to figure out which results you want to keep/plot.

For your boxplot, it looks like you're picking the correct top SNP (for CB1908), but I... have no idea what you're plotting. It looks like you're plotting some kind of correlation between genotypes? Really not sure. What you want to be plotting is the relationship between the genotypes of your top SNP (rs10876043) and the values of your phenotype (namely, the IC50 of the CB1908 drug). So you should have a series of 3 boxplots--one for each genotype of rs10876043--where for each genotype, you plot the IC50 values of all individuals with that genotype. Also make sure you label everything all pretty.

image

Pretty plots

3/4

Exercise Points Possible Grade
Step 1.2 PC plot 1 1
Step 2.2 AFS plot 1 1
Step 3.2 Manhattan plots 1 0
Step 3.3 effect size boxplot 1 1

Missing your Manhattan plot.

Grade

Total: 7/10

dtaylo95 commented 9 months ago

Almost there! There's just one verrrrrry small issue:

When you are grabbing the phenotype values (i.e. the CB1908 IC50) values to make the boxplot, you're incrementing your number index before you grab each phenotype, rather than after, which means all your phenotypes are shifted by one index (i.e. you're pairing each phenotype with the wrong genotype)

https://github.com/mdetels1/qbb2023-answers/blob/b66d10212f3f226534a1f3fb68d8b0e7607349c6/week6_homework/plotting.py#L112-L119

That's why your plot doesn't look the same as mine. Note that when you fix this, you will also need to change/update the way you're dropping the nan values from the heterozygous phenotypes:

https://github.com/mdetels1/qbb2023-answers/blob/b66d10212f3f226534a1f3fb68d8b0e7607349c6/week6_homework/plotting.py#L135-L136

This is a super minor issue though, so I'm happy to give you a 10/10. I just want you to be aware of why your plot isn't quite working right.

dtaylo95 commented 9 months ago

Whether or not you decide to make that change, feel free to close this issue.

Current grade: 10/10