Question on Using D2co-defined Interest Labels

Thank you very much for your interest in our work. The interest label we defined in D2Co are

r_{\mathbf{x}} = \left\{
\begin{aligned}
    & 1,\quad \mathrm{if}~~ (d\leq18s \land w=d) \lor (d>18s \land w>18s) \mathrm{;} \\
    & 0,\quad \mathrm{else;} 
\end{aligned}
\right.

And the one we defined similarly in CWM

\begin{split}
    r_{\mathbf{x}} = \left\{
    \begin{aligned}
        & 1,\quad \mathrm{if}~~ (d\leq w_{\text{0.7} \land w=d) \lor (d>w_{\text{0.7} \land w>w_{\text{0.7}) \mathrm{;} \\
        & 0,\quad \mathrm{else;} 
    \end{aligned}
    \right.
\end{split}

The only difference between these two definitions is the replacement of $18s$ in D2Co with $w{0.7}$ in CWM ($w{0.7}$ means the 70% quantile in watch time). And in both the KuaiRand and WeChat, the $w_{0.7}$ is $17s$, which is similar to the $18s$ we defined in D2Co.

The $18s$ is the definition of Long_view in the original KuaiRand dataset, and they considered this label suitable as an indicator of user interest. However, during the review process of D2Co, we found that some reviewers did not approve of this explicit definition, so we replaced $18s$ with $w_{0.7}$ in CWM.

As you can see, unbiased evaluation of video watching is an open question. The definition of such an interest label that we have adopted may not be perfect either. We are also looking forward to follow-up work to address this open question.

hyz20 / CWM

Question on Using D2co-defined Interest Labels #1