torvalds-dev / llama_index

LlamaIndex (formerly GPT Index) is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
0 stars 0 forks source link

[Feature Request]: Can I check the loss when finetuning Embedding model? #20

Open torvalds-dev opened 10 months ago

torvalds-dev commented 10 months ago

Feature Description

I was using finetune_embedding.ipynb finetuning an Embedding model, but found no where to return loss curve, it should include train loss and val loss. So how can I get them? Thanks.

Reason

No response

Value of Feature

No response

torvalds-dev commented 10 months ago

torvalds.dev is analyzing the ticket

torvalds-dev commented 10 months ago

The user is requesting a feature to check the loss when fine-tuning an Embedding model. They were using the finetune_embedding.ipynb notebook but found no way to return a loss curve, which should include both training loss and validation loss. They are asking how they can get these values.

The relevant files to this issue are:

  1. llama_index/embeddings/__init__.py: This file contains the initialization of various embedding models. The user might be using one of these models for fine-tuning.

  2. llama_index/finetuning/gradient/base.py: This file contains the GradientFinetuneEngine class, which is used for fine-tuning models. It might be possible to modify this class to return the loss values during fine-tuning.

  3. llama_index/token_counter/mock_embed_model.py: This file contains a mock embedding model. It's unclear if this is directly related to the user's issue, but it might be useful for testing any changes.

  4. llama_index/callbacks/finetuning_handler.py: This file contains callback handlers for fine-tuning, including OpenAIFineTuningHandler and GradientAIFineTuningHandler. These handlers might be able to be modified to track and return loss values during fine-tuning.

Based on these files, the following actions could be taken:

  1. Investigate the GradientFinetuneEngine class in llama_index/finetuning/gradient/base.py to see if it's possible to modify it to return loss values during fine-tuning.

  2. Look into the callback handlers in llama_index/callbacks/finetuning_handler.py to see if they can be modified to track and return loss values.

  3. If necessary, create a new callback handler specifically for tracking loss during fine-tuning.

  4. Test any changes using the mock embedding model in llama_index/token_counter/mock_embed_model.py.

  5. Update the finetune_embedding.ipynb notebook to demonstrate how to use the new feature.