Congrats on your amazing paper! I just have a question regarding the GVAE loss functions. It's not clear why you are using BCE instead of categorical cross-entropy. At each stage of decoding (after masking invalid rules), the model should select one of valid production rules, so it's more reasonable to use CCE instead of BCE, right? Or is there a reason I'm not seeing.
Hi,
Congrats on your amazing paper! I just have a question regarding the GVAE loss functions. It's not clear why you are using BCE instead of categorical cross-entropy. At each stage of decoding (after masking invalid rules), the model should select one of valid production rules, so it's more reasonable to use CCE instead of BCE, right? Or is there a reason I'm not seeing.
Best, Mohsen