Hi~ I've read your paper and the code, and I am little confused about some details.
Firstly, the coordinates of bounding boxes are discretized and the vocabulary for layouts contains the 8-bit uniform quantization token (which is token indexes are 0-127 in your code), categorial token, padding token, bos and eos token. But what if during inference stage, the predicted coordinate of the trained model doesn't lie in 0-127 tokens? Will you do any post processing to correct these mis-predicted layouts?
Secondly, in your paper, it is said that 'we minimize KL-Divergence between soft-max predictions and output one-hot distribution with Label Smoothing', but why the code is still a cross entropy loss?
I also have a question. The paper described that the input order of the data is a random sequence, but from the code point of view, it is input in a specific sort of order.
Hi~ I've read your paper and the code, and I am little confused about some details.
Firstly, the coordinates of bounding boxes are discretized and the vocabulary for layouts contains the 8-bit uniform quantization token (which is token indexes are 0-127 in your code), categorial token, padding token, bos and eos token. But what if during inference stage, the predicted coordinate of the trained model doesn't lie in 0-127 tokens? Will you do any post processing to correct these mis-predicted layouts?
Secondly, in your paper, it is said that 'we minimize KL-Divergence between soft-max predictions and output one-hot distribution with Label Smoothing', but why the code is still a cross entropy loss?