meyer-lab / systemsSerology

Dissecting systems serology with a tensor factorization
https://asmlab.org
3 stars 2 forks source link

Implement imputations for individual measurements #270

Closed murphymadeleine21 closed 3 years ago

murphymadeleine21 commented 3 years ago

This should take care of removing individual values for imputation. However, I'm not exactly sure how to compare it to PCA because I don't know how to remove the exact same measurements from the matrix since the indexing is different. Any ideas?

Also, for fig 3c, as we increase the amount of missing values, do we want to remove individual values (to compare to PCA) or remove entire tensor chords again. Also, are we evaluating the total missingness (i.e. counting the already missing values), or just the newly imputed values? I'm assuming total since the axis is labeled "percent missing"?

codecov[bot] commented 3 years ago

Codecov Report

Merging #270 (abde0b6) into master (8728557) will increase coverage by 5.93%. The diff coverage is n/a.

:exclamation: Current head abde0b6 differs from pull request most recent head 5ab6421. Consider uploading reports for the commit 5ab6421 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master     #270      +/-   ##
==========================================
+ Coverage   92.01%   97.94%   +5.93%     
==========================================
  Files           7        6       -1     
  Lines         363      341      -22     
==========================================
  Hits          334      334              
+ Misses         29        7      -22     
Flag Coverage Δ
unittests 97.94% <ø> (+5.93%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8728557...5ab6421. Read the comment docs.

aarmey commented 3 years ago

@cyrillustan can look at this first.

You can remove the values, then flatten the tensor. Figure 2 has an example of doing this. In fact, we could just write a reusable function that takes the tensor and matrix and flattens them into one matrix.

cyrillustan commented 3 years ago

One thing you can double-check is whether the calculation of PCA R2X is right, because it is either very low or a bit higher than CMTF (when seen as 1 - current_value)

murphymadeleine21 commented 3 years ago

When I run it in a notebook for 10 comps I get the PCA R2X being 0.92. This seems to make sense?