Open karlrohe opened 9 months ago
im
objects could also be passed to other matrix decompositions!
im
looks too much like lm
? perhaps io
for "interaction object"? other names??
at some point, we will want to "append" variables to the row_universe
, column_universe
, or "values inside the matrix". These values could be useful for interpreting the pcs. Moreover, they might be used to fit some models. For example,
1) we might have a "treatment variable" on values inside the matrix or
2) perhaps if it is hyper-linked text corpus, then we have text on the row_universe and we might do something like pairGraphText
so, this is an operation that we will want to be able to perform on the interaction_model
object.
another operation we might want to perform... in the case of low-rank matrix completion, we might want to fit a "fixed effect model" (using row/column id's as the factors) to center the data before fitting.
If our nodes are journal articles, sometimes it makes sense to "block model" by journal. We did this in the example journal graph used in cv_eigen + vsp + tsg.
empirically, i think this is a super powerful data operation. not that interesting theoretically or methodologically. So, the literature never talks about it.
is this something that we want to enable after make_interaction_model? Or, is this something that... if you want to do that blocking... you should do it in your tibble, then make a new formula?
if it is an operation performed on an interaction_model, then we would need to first im = append(im, tibble_giving_paper_journal)
. Then... would it be an argument to pca? or would there be another function like im= block(im, journal)
and it might just change something like im$setting$rowxxxx = journal
?
What diagnostics/operations do we want to have access to before calling pca_?
Right now,
1)
diagnose
looks at degree stuff 2)pick_dim
computes cross-validated eigenvalues 3) coming soon-ish: ability to chop off low degree nodes (maybe compute k-core), pre-pca.All of this makes it seem like there would be some speed and convenience (for some) to build a new class ("interaction model"?). So, you could still use
pca_sum(outcome~row*col, data, k)
. But also, you could precompute the im object (for diagnostics) and even edit the data...Then,
pca_sum
also accepts these im objects:What are other popular diagnostics that folks run before computing pca? Perhaps:
1)
sqrt
orlog(x +1)
the elements of the matrix? 2) drop nodes?3) weight the vertices? 4) others... please suggest!