Alternative Least Square(ALS): $R_{m\times n}=P_{m\times k}\times Q^T_{n\times k}$
easy to parallelize
fast than SGD in not too sparse data
Weighted-ALS: one-class $$min_{q^,p^}\Sigma_{(u,i)\in \kappa}c_{ui}(r_{ui}-p_uq_i^T)^2+\lambda(\lVert q_i\rVert ^2+\lVert p_u\rVert ^2)$$ where $c_{ui}=1+\alpha n$ stands for confidence and $\alpha=40$ , $n$ is frequence.
sample in hot items (negative sampling)
Tools to search similar items: Faiss(ball tree), Annoy, NMSlib, KGraph
Ranking
methods: point-wise, pair-wise, list-wise
Bayes Personalized Recommendation(BPR)
sample: (user, item1, item2, True/False)
$\Pi_{u,i,j}p(i>_uj\mid \theta)p(\theta)$
Mini-batch Stochastic Gradient Descent(MSGD)
Area Under Curve(AUC): $AUC=\frac{\Sigma_{i\in samples}r_i-\frac{1}{2}M\times (M-1)}{M\times N}$
Wilson section: $\frac{\hat{p}+\frac{1}{2n}z^2_{1-\frac{\alpha}{2}}\pm z_{1-\frac{\alpha}2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}+\frac{z^2_{1-\frac{\alpha}2}}{4n^2}}}{1+\frac{1}nz^2_{1-\frac{\alpha}2}}$
$\hat{p}$: praise rate, $z_{1-\frac{\alpha}2}$: Z statistic with confidence $\alpha$
Bayes average: $\frac{v}{v+m}R+\frac{m}{v+m}C$
$R$: average score, $v$: vote, $m$: average vote, $C$: average score
algorithm level: add user quality, restrict user weight
Deploy
Real Time
Test Platform
scale: $N>=10.5(\frac{s}{\theta})^2$ , $s$ is standard deviation, $\theta$ is sensitivity. (90% confidence)
Google platform:
A domain is a segmentation of traffic
A layer corresponds to a subset of the system parameters
An experiment is a segmentation of traffic where zero or more system parameters can be given alternate values that change how the incoming request if processed.
Recommendation System Checklist
https://ift.tt/SfGTlJu
Notes of "36 strokes of recommended system"
Basic
When to use?
$\frac{N_{connection}}{N_{user} \times N_{item}}$
Stage
Forecast
Problem
User profile
Models
Content-based
Collaborative filtering
Similarity
Euclidean distance: $d(p, q)=\sqrt{\Sigma_{i=1}^n(q_i-p_i)^2}$
Cosine similarity: $cos(\theta)=\frac{A\cdot B}{\lVert A\rVert \lVert B\rVert}$
Pearson correlation: $\rho_{X,Y}=\frac{\Sigma^n_{i=1}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\Sigma^n_{i=1}(x_i-\bar{x})^2}\sqrt{\Sigma^n_{i=1}(y_i-\bar{y})^2}}$
Jaccard index: $J(A, B)=\frac{A\bigcap B}{A\bigcup B}$
Optimization
Ranking
Ensemble
Bandit
choice = numpy.argmax(pymc.rbeta(1 + self.wins, 1 + self.trials - self.wins))
Deep learning
Leaderboard
Weighted sampling
Deduplicated
Data
Collect data
Defence
Attack methods
Defence
Deploy
Real Time
Test Platform
Database
API
CF and MF
Ensemble
All in one
via Cyanide
October 14, 2024 at 02:28PM