Closed snotskie closed 12 months ago
a possible memory efficient implementation might be to:
in linearmodeling, accumulate!,
make a closure helper function that increments as a side effect
its args should include the current convo id
so memoizing it should prevent the side effect incrementation from happening more than once, and in effect only eats the miminal extra memory
memoization would only be enabled when binarizeConvoCounts is set to true, and its cache could be reset between convos to reclaim memory
this would be a hackish, non-explicit solution to the problem, though
chatting with David and Jenn about this, it seems that the practical solution in most cases is to not do "whole conversation" models at all, but instead do a simpler BiplotENAModel
which does "code and count". That'll probably capture the same information as the "whole convo" model would have, for cheaper
wontfix
A question from Jenn
WebENA supports a "whole conversation" model that is like Infinite window (default in this package), except it normalizes so that connection counts are either 0 (never occured) or 1 (occured at least once). This way, the codes on the first few lines don't cause a disproportionate effect compared to codes on the last few lines; and one really long conversation doesn't cause an disproporitionate effect compared to really short conversations
We could do this with a new parameter to
ENAModel
and friends, namedbinarizeConversationCounts
,thresholdNorm
, or something like that. It would perform the following map logic (moreorless) to edge counts after each conversation has been counted:Note, an Undirected Infinite Window Binarized Conversations model (aka, "whole conversation") with a single conversation is explained totally by a simple code and count model. So,
ENAModel
would capture the same information as the unconnectedBiplotENAModel
, but with a needlessly complicated plot