Support "whole conversation" models

snotskie commented 1 year ago

A question from Jenn

WebENA supports a "whole conversation" model that is like Infinite window (default in this package), except it normalizes so that connection counts are either 0 (never occured) or 1 (occured at least once). This way, the codes on the first few lines don't cause a disproportionate effect compared to codes on the last few lines; and one really long conversation doesn't cause an disproporitionate effect compared to really short conversations

We could do this with a new parameter to ENAModel and friends, named binarizeConversationCounts, thresholdNorm, or something like that. It would perform the following map logic (moreorless) to edge counts after each conversation has been counted:

existing_counts .+= map(current_counts) do count
    if count > 0
        return 1
    else
        return 0
    end
end

Note, an Undirected Infinite Window Binarized Conversations model (aka, "whole conversation") with a single conversation is explained totally by a simple code and count model. So, ENAModel would capture the same information as the unconnected BiplotENAModel, but with a needlessly complicated plot

snotskie commented 1 year ago

a possible memory efficient implementation might be to:

in linearmodeling, accumulate!,

make a closure helper function that increments as a side effect

its args should include the current convo id

so memoizing it should prevent the side effect incrementation from happening more than once, and in effect only eats the miminal extra memory

memoization would only be enabled when binarizeConvoCounts is set to true, and its cache could be reset between convos to reclaim memory

this would be a hackish, non-explicit solution to the problem, though

see https://github.com/marius311/Memoization.jl

snotskie commented 1 year ago

chatting with David and Jenn about this, it seems that the practical solution in most cases is to not do "whole conversation" models at all, but instead do a simpler BiplotENAModel which does "code and count". That'll probably capture the same information as the "whole convo" model would have, for cheaper

snotskie commented 12 months ago

wontfix

snotskie / EpistemicNetworkAnalysis.jl

Support "whole conversation" models #33