In the paper Section 4.2 Reward, you said that you developed a reward model to assess the performance by calculating the similarity between the final UI page and the object UI page.
I wonder how the reward model is designed and trained. And would the reward model be released?
In the paper Section 4.2 Reward, you said that you developed a reward model to assess the performance by calculating the similarity between the final UI page and the object UI page. I wonder how the reward model is designed and trained. And would the reward model be released?