markziemann / Gene-function-imputation

Gene function imputation by coexpression
3 stars 0 forks source link

Why are imputed GOs -1? #12

Closed megan-soria closed 3 years ago

megan-soria commented 4 years ago

Clust1_subtract <- cluster1_filtered_imputed - cluster1_inputMat

megan-soria commented 4 years ago

Using the changes made in the weighted correlation function for cluster 1 ( detailed in issue #17), I was able to see an adjustment in the values of the resultant data frame, although the shape of the data still stays the same.

Most of the values are centered near 0 to 0.2 due to the matrix multiplication we did after merging the correlation values per gene in cluster 1.

I followed my original haunch about the threshold values and found that, using the new function, I was able to get non negative values at threshold > 0.01

the which(Clust1_subtract == -1) test gives no result

See Chunk 13: Using wcorr2_cluster

No-negatives
markziemann commented 4 years ago

Great job. If you've solved this -1 problem feel free to close the issue

megan-soria commented 4 years ago

Source of issue: data shape for the imputed data frame using the following code

imputed_Clust1_df <- as.data.frame(sapply(cluster1_wGO_df, function(x){ as.numeric(x > Clust1_threshold) })) row.names(imputed_Clust1_df) <- row.names(cluster1_wGO_df)

Solution: [commit e1adda1646c5e5b28add829ff6d292b71c9680ba] imputed_Clust1_df_v2 <- (as.matrix(cluster1_wGO_df) > Clust1_threshold)*1

megan-soria commented 3 years ago

-1 values still present after doing a heatmap visualization check with the output-input data frames.

megan-soria commented 3 years ago

commit a6dd79c939c54fb46553a41722a8daaf717c746c

markziemann commented 3 years ago

I think the mystery is solved for now :) For future reference, the -1 imputations arose because in the wcorr function any genes with NA values were set to zero. This is normally okay but there are some genes with existing GO annotations that came up with NA that were turned to zero. This would propagate through the pipeline and result in -1 appearing in the subtracted matrix!