Closed andychu closed 9 years ago
OK thanks for the review. I addressed all but the last one. I will do something to compare the results. I think the problem is that Decode() is fairly noisy on a small number of reports.
DONE: Updated docstring for bit_indices
Having both params and params_list is mainly because the unit tests already use a single set of params.
I removed the num_variables > 4 check, because it was arbitrary. The unit tests test 3 variables, so 4 should work.
I only used mclapply() in cases where we're iterating over the number of reports, but not say the number of cohorts. mclapply() forks processes so it has some significant memory overhead, as well as startup overhead.
UpdatePij: This is being called inside for (i in 1:max_em_iters), not mclapply?
Added the check that dimensions == 2 in ConstructFastEM.
Wrapping: I changed this variable type to size_t. num_entries and entry_size are uint32_t, but their product could be more than 4 GB, which should work fine on a 64 bit machine. I don't want to hard-code any assumptions about the machine memory size.
analysis/cpp/run.sh -- removed extra comments
OK, in association_test.R, I added TestTwoImplementations() which runs both algorithms. The L1 difference between the matrices is 9.7e-17 (not 0 curiously).
I also renamed the ugly $rmap and $map stuff, and did a few other cleanups.
PTAL.
OK good point. I changed this and the CSV now looks like this:
(I ended up using something a little different than "melt", since that was doing weird things)
"flag..HTTPS","domain","proportion" "TRUE","bing.com",0.263930065623489 "FALSE","bing.com",0.147970233934715 "TRUE","yahoo.com",0.182032391377609 "FALSE","yahoo.com",0.114596102564608 "TRUE","google.com",0.190706050320775 "FALSE","google.com",0.095164303098314 "TRUE","Other",0.00340437927526885 "FALSE","Other",0.00219647380522421
Any more comments on this?
[LGTM]
Thanks Andy! Curious to know what problems you have with melt, but anything that flattens the assoc results before printing is sufficient for now.