Closed MineSelf2016 closed 1 month ago
May I ask why the model inference sample outputis executed instead of predicting numerical values
Hello, while reading the paper, I couldn't find details on how the loss function is calculated for the feature and prediction tokens. In Section 3.3, it mentions that 'While GTL employs a next-token prediction loss similar to that used in LLMs, it distinguishes itself from auto-regressive pre-training on language data.' Does this imply that we can apply the next-token strategy to pretrain the model without any modifications to the CausalLM loss function? Thanks.
While the type of loss function (cross-entropy) remains the same as in a conventional CausalLM, our method involves calculating this loss only on the tokens that correspond to features and labels within our training data. This targeted approach helps in effectively guiding the model to focus on and improve its predictions related to the task-specific features and outcomes.
May I ask why the model inference sample outputis executed instead of predicting numerical values
Did you recover our model weights by following the two steps in the readme? https://github.com/microsoft/Industrial-Foundation-Models?tab=readme-ov-file#prepare-the-model-checkpoint
Hello, while reading the paper, I couldn't find details on how the loss function is calculated for the feature and prediction tokens. In Section 3.3, it mentions that 'While GTL employs a next-token prediction loss similar to that used in LLMs, it distinguishes itself from auto-regressive pre-training on language data.' Does this imply that we can apply the next-token strategy to pretrain the model without any modifications to the CausalLM loss function? Thanks.
While the type of loss function (cross-entropy) remains the same as in a conventional CausalLM, our method involves calculating this loss only on the tokens that correspond to features and labels within our training data. This targeted approach helps in effectively guiding the model to focus on and improve its predictions related to the task-specific features and outcomes.
Got it. Thank you!
Hello, while reading the paper, I couldn't find details on how the loss function is calculated for the feature and prediction tokens. In Section 3.3, it mentions that 'While GTL employs a next-token prediction loss similar to that used in LLMs, it distinguishes itself from auto-regressive pre-training on language data.' Does this imply that we can apply the next-token strategy to pretrain the model without any modifications to the CausalLM loss function? Thanks.