mitodl / edx2bigquery

Tool to convert & load data from edX platform into BigQuery
GNU General Public License v2.0
29 stars 29 forks source link

Added stats_show_ans_before and filtered sybils3 #16

Closed CGNx closed 9 years ago

CGNx commented 9 years ago

show_ans_before is now for all pairs and selects the best shadow for a given candidate cameo. I also included algorithms to compute: Within Show Answer Before: {Pearson Correlations, Median max_dt, Optimal Scoring of which is best shadow for a cameo}

Pearson Correlations - computed for all (not cutoff of .99 for example) and normalized correctly (verified) Median Max Dt - Average max dt was terribly skewed by outliers (going to sleep in the middle of cheating). Median fixed all these issues. Optimal Scoring - Choose the best shadow for cameo candidate by combining z-score normalized scores from median_max_dt, norm_pearson_corr, percent_show_ans_before, and num_show_ans_before (many hours spent verifying this selected good choices across ~10 courses verified by hand)

Note show_ans_before now MUST be run before sybils3. (I added the function to make_show_ans_before as the first line of sybils3)