jaak-s / BayesianDataFusion.jl

Bayesian multi-tensor factorization methods, with side information
Other
29 stars 19 forks source link

Computing out-of-sample stdev #11

Open suchow opened 4 years ago

suchow commented 4 years ago

Was #6 closed because it was implemented?

I see that predictions on the test set include both the predicted value pred and the standard deviation stdev.

julia> result["predictions"]
500000×5 DataFrames.DataFrame
│ Row    │ E1   │ E2   │ values │ pred    │ stdev     │
├────────┼──────┼──────┼────────┼─────────┼───────────┤
│ 1      │ 5121 │ 1923 │ 4.0    │ 4.17571 │ 0.106283  │
│ 2      │ 481  │ 3528 │ 5.0    │ 3.80128 │ 0.309201  │
│ 3      │ 1279 │ 3175 │ 4.0    │ 2.9776  │ 0.237935  │
│ 4      │ 5364 │ 1172 │ 5.0    │ 4.29892 │ 0.143759  │
│ 5      │ 424  │ 1356 │ 4.0    │ 3.91691 │ 0.103391  │
│ 6      │ 258  │ 457  │ 5.0    │ 4.4669  │ 0.0985462 │
│ 7      │ 1978 │ 2555 │ 1.0    │ 1.36788 │ 0.290181  │
│ 8      │ 1150 │ 193  │ 1.0    │ 1.55493 │ 0.160465  │
│ 9      │ 2279 │ 1097 │ 5.0    │ 4.00714 │ 0.184425  │
⋮

However, the full predictions include only the predicted values themselves, and not the stdev.

julia> result["predictions_full"]
6040x3952 Array{Float64,2}:
 4.39071  3.91885  3.64339  3.41556  3.72048  3.98654  …

I'm looking to assess prediction uncertainty for out-of-sample entity pairs.

My current understanding of how to implement this is to modify macau.jl so that in addition to storing the sum over sampled predictions during sampling, like it does on line 146 (https://github.com/jaak-s/BayesianDataFusion.jl/blob/master/src/macau.jl#L146), it would additionally store what's needed to compute variance (basically the sum of squares, though some adjustments are needed for numerical precision https://dl.acm.org/doi/10.1145/3221269.3223036).

Is this something you be interested in including, unless it's already available?

jaak-s commented 4 years ago

Yes, we would be interested to include that functionality. For example, we could add another field called result["predictions_full_stdev"] to store the matrix (tensor in the general case) of standard deviations.

Are you interested to make a pull request?

suchow commented 4 years ago

I'll give it a try! I'm new to Julia, so this may take a while.

jaak-s commented 4 years ago

Cool! Let me know if you need any help.