Plot results for neural network experiments, across all genes - Githubissues

greenelab / pancancer-evaluation

Evaluating genome-wide prediction of driver mutations using pan-cancer data

BSD 3-Clause "New" or "Revised" License

9 stars 3 forks source link

Plot results for neural network experiments, across all genes #85

Closed jjc2718 closed 10 months ago

jjc2718 commented 11 months ago

Also cleaning up some other figures for the paper draft. This PR touches a lot of files but the changes aren't that substantial, most of them are just cosmetic.

review-notebook-app[bot] commented 11 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

review-notebook-app[bot] commented 11 months ago

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:36Z ----------------------------------------------------------------

legend is covering the bars, maybe you could put it horizontal below the plot. unless this is a just for you plot and not needed for the paper

jjc2718 commented on 2023-08-15T18:12:44Z ----------------------------------------------------------------

Yeah, this is a "just for me" plot. The ones that show the train/valid/test breakdown are what we'll show in the paper, so I spent more time making them pretty.

review-notebook-app[bot] commented 11 months ago

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:39Z ----------------------------------------------------------------

It's really interesting how ARID2 and VHL perform so much better on the test, CCLE, dataset. Could this be caused by a cell-line specific effect? Like a mismatch in cancer type proportions to cell line proportions? Not sure if its important for your paper, its just very striking.

jjc2718 commented on 2023-08-15T18:14:39Z ----------------------------------------------------------------

Yeah, I'm not sure why there are so many genes that perform better on CCLE. It could be something technical or dataset-specific like what you're describing, or it could be something biological like the cell line data is just cleaner/more well-behaved in these cases than the tumor samples from TCGA. If I had more time maybe I'd try to detangle the two, but I have to start writing my thesis at some point, haha.

review-notebook-app[bot] commented 11 months ago

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:40Z ----------------------------------------------------------------

I don't fully understand this plot. Why is test error always at 0.6 even though the cv and train errors are going down?

review-notebook-app[bot] commented 11 months ago

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:41Z ----------------------------------------------------------------

yeah, I'm confused how the training error is far below the test error and why the test error is constant. To me this seems like a bias in the test set?

review-notebook-app[bot] commented 11 months ago

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:43Z ----------------------------------------------------------------

Ignore if this doesn't make sense, but maybe adding another comparator (like a dummy comparator the predicts the most common label) would convince readers that your model is actually learning something. I think I'm just hung up on the test AUPR never changing over the epochs. Maybe it is changing, but it happens in the first few epochs and I can't see it in your plot? I might also be completely missing the point of these plots, too. It might also be that your batch size is small enough that there isn't much change per epoch?

review-notebook-app[bot] commented 11 months ago

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:43Z ----------------------------------------------------------------

I guess if there is a difference in the mean AUPR based on layer size it proves that it is learning a better model, I think I just couldn't really see it from the learning curves.

review-notebook-app[bot] commented 11 months ago

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:31:09Z ----------------------------------------------------------------

Main header and section headers would help readers understand what this notebook is doing and how to interpret the plots.

jjc2718 commented on 2023-08-15T18:22:59Z ----------------------------------------------------------------

I'll add them! These scripts started as just a model diagnostic thing, but we are using a few of these figures in the supplement so I'll add some documentation.

review-notebook-app[bot] commented 11 months ago

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:31:11Z ----------------------------------------------------------------

Main header and section headers would help readers understand what this notebook is doing and how to interpret the plots.

review-notebook-app[bot] commented 11 months ago

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:31:12Z ----------------------------------------------------------------

Main header and section headers would help readers understand what this notebook is doing and how to interpret the plots.

nrosed commented 11 months ago

Looks good to me; the main thing that would help future readers are some main- and sub-section headers for some of the scripts.

Other comments were just conceptual questions I had on some of your plots. I might have just missed the intention of the plots, so it ignore them if they don't make any sense.

jjc2718 commented 10 months ago

Yeah, this is a "just for me" plot. The ones that show the train/valid/test breakdown are what we'll show in the paper, so I spent more time making them pretty.

View entire conversation on ReviewNB

jjc2718 commented 10 months ago

Yeah, I'm not sure why there are so many genes that perform better on CCLE. It could be something technical or dataset-specific like what you're describing, or it could be something biological like the cell line data is just cleaner/more well-behaved in these cases than the tumor samples from TCGA. If I had more time maybe I'd try to detangle the two, but I have to start writing my thesis at some point, haha.

View entire conversation on ReviewNB

jjc2718 commented 10 months ago

I'll add them! These scripts started as just a model diagnostic thing, but we are using a few of these figures in the supplement so I'll add some documentation.

View entire conversation on ReviewNB

jjc2718 commented 10 months ago

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:40Z ----------------------------------------------------------------

I don't fully understand this plot. Why is test error always at 0.6 even though the cv and train errors are going down?

For whatever reason, these models seem to saturate really fast rather than improving slowly across epochs. Here's a comparison of learning rates I did a while ago for KRAS mutation prediction:

For the lower learning rates, you can see that CV/test performance improves a bit more gradually, but the resulting CV and test performance after 200 epochs ends up being about the same as it is for the higher learning rates (other than the obviously bad ones like 0.01).

In the plots you're looking at for hidden layer size, I was doing a grid search (I think over the same range shown here) and choosing the best learning rate. So what that ends up picking happens to be one of the models that saturates really fast, at least for KRAS; I haven't looked too much at other genes.

We could add some kind of baseline, but since what we ultimately care about is the "best vs. smallest good" model selection comparison I don't think it's that important in this case. Like you mentioned, because of the considerable variability between hidden layer sizes I'm fairly confident that the model is learning something, and our goal isn't really to find the absolute best performing NN model here, just a reasonable one that allows us to think about model complexity by comparing models with different performances across many genes.