microsoft / Industrial-Foundation-Models

Dedicated to building industrial foundation models for universal data intelligence across industries.
MIT License
37 stars 3 forks source link

What is the training loss? #10

Closed MineSelf2016 closed 1 month ago

MineSelf2016 commented 2 months ago

Hello, while reading the paper, I couldn't find details on how the loss function is calculated for the feature and prediction tokens. In Section 3.3, it mentions that 'While GTL employs a next-token prediction loss similar to that used in LLMs, it distinguishes itself from auto-regressive pre-training on language data.' Does this imply that we can apply the next-token strategy to pretrain the model without any modifications to the CausalLM loss function? Thanks.

xinyani commented 1 month ago

May I ask why the model inference sample outputis executed instead of predicting numerical values

xumwen commented 1 month ago

Hello, while reading the paper, I couldn't find details on how the loss function is calculated for the feature and prediction tokens. In Section 3.3, it mentions that 'While GTL employs a next-token prediction loss similar to that used in LLMs, it distinguishes itself from auto-regressive pre-training on language data.' Does this imply that we can apply the next-token strategy to pretrain the model without any modifications to the CausalLM loss function? Thanks.

While the type of loss function (cross-entropy) remains the same as in a conventional CausalLM, our method involves calculating this loss only on the tokens that correspond to features and labels within our training data. This targeted approach helps in effectively guiding the model to focus on and improve its predictions related to the task-specific features and outcomes.

xumwen commented 1 month ago

May I ask why the model inference sample outputis executed instead of predicting numerical values

Did you recover our model weights by following the two steps in the readme? https://github.com/microsoft/Industrial-Foundation-Models?tab=readme-ov-file#prepare-the-model-checkpoint

MineSelf2016 commented 1 month ago

Hello, while reading the paper, I couldn't find details on how the loss function is calculated for the feature and prediction tokens. In Section 3.3, it mentions that 'While GTL employs a next-token prediction loss similar to that used in LLMs, it distinguishes itself from auto-regressive pre-training on language data.' Does this imply that we can apply the next-token strategy to pretrain the model without any modifications to the CausalLM loss function? Thanks.

While the type of loss function (cross-entropy) remains the same as in a conventional CausalLM, our method involves calculating this loss only on the tokens that correspond to features and labels within our training data. This targeted approach helps in effectively guiding the model to focus on and improve its predictions related to the task-specific features and outcomes.

Got it. Thank you!