Hi, thanks for sharing this very interesting work. I have a question for finetuning on downstream datasets - the paper mentioned that you use 60% of the tokens for finetuning based on the motion heatmap. Do you also only process 60% of the tokens during inferencing, or you still use all the tokens for inference? Thanks in advance.
Hi, thanks for sharing this very interesting work. I have a question for finetuning on downstream datasets - the paper mentioned that you use 60% of the tokens for finetuning based on the motion heatmap. Do you also only process 60% of the tokens during inferencing, or you still use all the tokens for inference? Thanks in advance.