meyer-lab / DDMC

Clusters phosphoproteomics data based on a combination of the sequence information and abundance changes over conditions.
https://asmlab.org
1 stars 1 forks source link

Various AXL updates #528

Closed mcreixell closed 2 years ago

mcreixell commented 2 years ago
codecov[bot] commented 2 years ago

Codecov Report

Merging #528 (7c40e4e) into master (bdd5244) will not change coverage. The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #528   +/-   ##
=======================================
  Coverage   38.68%   38.68%           
=======================================
  Files          16       16           
  Lines        1357     1357           
=======================================
  Hits          525      525           
  Misses        832      832           
Flag Coverage Δ
unittests 38.68% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
msresist/pca.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update bdd5244...7c40e4e. Read the comment docs.

mcreixell commented 2 years ago

@aarmey The notebook with the code to generate the tensor is downstream_dynamics.ipynb. Let me know if this looks right or not. Also haven't added any labels per axis but these are really easy to extract from the original 2D data set.

aarmey commented 2 years ago

Ok, I added the parafac call and some example plots of the output. The "2" in the parafac line sets the number of components.

First question—how is this data normalized? From the results I wasn't sure that it has been.

mcreixell commented 2 years ago

Thank you. This is not normalized, it's the raw signal.

aarmey commented 2 years ago

Ok, you probably want to z-score each phosphosite across conditions. Right now I think one bead region has a lot more signal, and that's pretty much all parafac is explaining.

mcreixell commented 2 years ago

I've zscored and added AXL&YAP which I missed in Scott's spreadsheet. Is an R2X of 0.315 acceptable? I also uncommented the PCA analysis but still a bit hard to interpret.

aarmey commented 2 years ago

This works like PCA—you can vary the number of components which will change R2X.

mcreixell commented 2 years ago

Perfect, with 6 components we get ~62%. Is there a notebook I can check to try to replicate an analysis/interpretation of parafac?

aarmey commented 2 years ago

Brian has a draft of an example notebook

aarmey commented 2 years ago

6 components is a lot given the size of the dataset. How much variance does 2 comp explain with PCA?

mcreixell commented 2 years ago

55%.

mcreixell commented 2 years ago

I'm going to include p-AXL and p-EGFR now. This might change things.

mcreixell commented 2 years ago

Now R2X is ~20% with 2 components, PCA ~48%... Any suggestions?

aarmey commented 2 years ago

What is the tensor size? You've added things.

mcreixell commented 2 years ago

(2, 10, 7, 12)