greenelab / tybalt

Training and evaluating a variational autoencoder for pan-cancer gene expression data
BSD 3-Clause "New" or "Revised" License
162 stars 61 forks source link

Methods threshold to 2.5 std dev #67

Closed gwaybio closed 6 years ago

gwaybio commented 6 years ago

This PR updates the threshold of considering high weight genes to what is specified in the methods (updated from 2 to 2.5).

The notebooks and nbconverted scripts are updated (extract_tybalt_weights and hgsc_subtypes_tybalt). Also updated are the .tsv files reflecting the threshold update.

The first two commits (a1391c9 and 76736d1) are the code changes. The other commits (576ce01 and f76b16d) are the associated .tsv changes.

gwaybio commented 6 years ago

Nothing to report code wise, but am curious to know why such a strict threshold?

This is mostly because this is how the field typically has assigned high weight. From @tj8901nm 's Cell System's Paper:

We defined positive HW genes as those that were more than 2.5 standard deviations from the mean on the positive side, and negative HW genes as those that were more than 2.5 standard deviations from the mean on the negative side.

Its strict mainly because we are interested in reducing false positives as much as we can. It would be interesting to toggle this value at some point, but it is beyond the scope of the current assignment.

gwaybio commented 6 years ago

Good to merge @danich1 ?