menchelab / BioProfiling.jl

A flexible Julia toolkit for high-dimensional cellular profiles
MIT License
13 stars 1 forks source link

Handle NA in decorrelate #9

Closed koalive closed 3 years ago

koalive commented 4 years ago

The decorrelate function does not handle missing values properly (it seems all columns are counted as correlated to the first column being filled with missing values).

bednarsky commented 3 years ago

Same issue if there are inf in the data frame.

koalive commented 3 years ago

The way to handle these cases seem to depend a lot on why there are such values in the data. 9101e0c9d7fcad62365bb65cc212198f06852af5 adds some tests that will throw a more explicit AssertionError when this is the case, so that the user can decide what to do (e.g. using a MissingFilter to remove entries with missing values). I'm closing this issue for now but this could be re-opened if there a solution that would work in most cases and wouldn't mislead users in ignoring prior issues with their data.