greenelab / tybalt

Training and evaluating a variational autoencoder for pan-cancer gene expression data
BSD 3-Clause "New" or "Revised" License
162 stars 61 forks source link

Push To Master: HGSC latent feature arithmetic #63

Closed gwaybio closed 6 years ago

gwaybio commented 6 years ago

Notebook and results from a latent feature analysis using TCGA HGSC subtype labels

gwaybio commented 6 years ago

coming right up :pizza:

gwaybio commented 6 years ago

Thanks @danich1 - I updated the notebook, nbconverted a .py, and fixed the .tsv files

gwaybio commented 6 years ago

Thanks @jaclyn-taroni - I added a function here and I agree that it clears up some redundancy

gwaybio commented 6 years ago

Thanks @danich1 and @jaclyn-taroni for your comments and suggestions. I believe I have addressed them all.

Most importantly, while wrapping up responding to your comments, I encountered a critical indexing bug when loading the weight matrix (13b3dcc). Specifically, I was one node off when determining important genes. For example, instead of considering genes for node 56 (for the nodes when they're between 1 and 100), I was analyzing genes for node 55 (for nodes when they're between 0 and 99). @danich1 pointed this out this indexing in pull request #56 (discussion here) and @jaclyn-taroni suggested using a previously output weight matrix rather than generating them specifically for each node, which may have avoided this issue in the first place.

This will require a completely updated interpretation of HGSC features, including updated pathway analyses, and an updated biorxiv submission. I also need to check the pathway analyses for the sex specific and melanoma specific nodes. Although because the sex specific genes were obvious, I do not suspect this is the case for that analysis.

Luckily this bug was caught before the PSB resubmission. The importance of code review in action!

gwaybio commented 6 years ago

cc @cgreene for this issue :point_up: we will need to update the biorxiv once all the PSB reviewer comments are addressed.

cgreene commented 6 years ago

I am definitely glad that you found this before publication. That would have put things into a retraction situation. We will need to update bioRxiv. It seems this is likely to result in substantial changes.

We will need to get this back to PSB as soon as possible so that they can look carefully at it to determine whether or not they can still publish it.

Even if they don't publish it, the outcome is still far better than a retraction. Definitely a win - though potentially a painful one.

gwaybio commented 6 years ago

We will need to update bioRxiv. It seems this is likely to result in substantial changes.

Definitely a win - though potentially a painful one.

I agree that finding this now rather than later is much preferred. But the update will not result in substantial changes and should not be as painful as initially perceived. It will definitely result in a correction, but only to interpretation of findings in one subsection. I confirmed that this error does not persist in any other analysis.

We will need to get this back to PSB as soon as possible so that they can look carefully at it to determine whether or not they can still publish it.

I will continue with the response to reviewers and will draft a cover letter including the responses and explanation of the correction. I do believe now that the correction will be minor.

I will merge this in and move forward on the response. Thanks!