mitreac / b575f19

UM DCMB BIOINF 575 Fall 2019 class repo
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

HW_6 Rubric inconsistent with assignment #23

Open CooperStansbury opened 5 years ago

CooperStansbury commented 5 years ago

Hello!

The final result is not impacted in this case, but which operator is correct for assignment 6: > or >=?

From the assignment in problem 1:

_Create a function computeDEgenes that prcesses a file with the file name given as a parameter and computes the a data frame with the differentially expressed genes (genes with a log2 fold change > 1, where the fold change is the ratio between the average disease RPKM values and the average of the control RPKM values). The data frame contains only on column which is the values of the log2ratios and the row names that are the gene symbols.

From the rubric for problem 1:

Reads the file content processes the data computing the means of the control rpkm columns and the AD rpkm columns then computing the log 2 ratio. Selects genes with absolute log2 ratios (AD/C >= 1). Returns result as data frame with the selected values as a column and the respective gene symbols as row names.

Thanks,

betteridiot commented 5 years ago

Let's stick with what was covered in Homework 5: AD/C >= 1

CooperStansbury commented 5 years ago

I have a similar question about thresholding the log ratios.

In the rubric, the absolute value should be used to threshold differentially expressed genes: abs(log_mean_ratio) >= 1, which and makes intuitive sense. But the assignment seems to ask for only those values where log_mean_ratio >= 1. Which is correct here?

From the assignment in problem 1:

_Create a function computeDEgenes that prcesses a file with the file name given as a parameter and computes the a data frame with the differentially expressed genes (genes with a log2 fold change > 1, where the fold change is the ratio between the average disease RPKM values and the average of the control RPKM values). The data frame contains only on column which is the values of the log2ratios and the row names that are the gene symbols.

From the rubric for problem 1:

Reads the file content processes the data computing the means of the control rpkm columns and the AD rpkm columns then computing the log 2 ratio. Selects genes with absolute log2 ratios (AD/C >= 1). Returns result as data frame with the selected values as a column and the respective gene symbols as row names.

Thanks,

betteridiot commented 5 years ago

Assume abs() since that is what we have done on past homework