mooreryan / featuretable

MIT License
1 stars 1 forks source link

Features with all 0 after subsetting #5

Open AlexaBennett opened 1 year ago

AlexaBennett commented 1 year ago

After filtering to remove my controls, several features now contain all zeros. I believe the default function should then remove all columns that sum to 0. If there is any reason to keep these features, maybe make a flag?

mooreryan commented 1 year ago

I assume you are meaning something like this?

ct <- data.frame(
  s1 = c(1,1,1), 
  s2 = c(100, 0, 100), 
  s3 = c(50, 0, 50), 
  row.names = letters[1:3]
)

FeatureTable$
  new(t(ct))$
  keep_samples(function(x) sum(x) > 10)$
  data

     a b   c
s2 100 0 100
s3  50 0  50

If so, then I normally add one more function call like keep_features(function(x) sum(x) > 0) to remove any features with zero counts. Or there are some "wordy" helpers too: keep_features(that_are_present).

FeatureTable$
  new(t(ct))$
  keep_samples(function(x) sum(x) > 10)$
  keep_features(that_are_present)$
  data

     a   c
s2 100 100
s3  50  50

As to the reasoning behind it, I generally try to avoid magic or implicit behavior. In my opinion, having the keep_samples function also have the potential to drop features is surprising/unexpected. Obviously, that is a pretty subjective criteria, and I'm not saying that I'm 100% right about it, so I would be willing to listen to an argument in favor of implicitly dropping features with zero counts after filtering by samples.

AlexaBennett commented 1 year ago

I can understand and appreciate that mindset. After I realized what was happening, I implemented a second step to remove all offending features. Maybe a more conservative approach is to throw a warning that one or more features now have a sum() = 0?

mooreryan commented 1 year ago

Yeah a warning message would be a good idea I think.