Closed muniefht closed 5 days ago
Hey @muniefht , cab you try passing the whole path? my.path.to.abuse_detection.AbusiveLanguageDetectionTemplate
Not sure if that will solve it, but i think its an easy one to try
I tried doing that. It did not work. Also I am using a shared server where we have different users. I am not a root user. But that should not be any problem? I dont know. I have found a work around on the problem. I placed my template inside torchtune repository.. in site-packages in torchtune and then wrote the path as torchtune.abuse_detection.AbusiveLanguageDetectionTemplate and that worked..
For others with this problem, I found the following workaround.
Let's say you are currently in some directory we'll call cwd
, and your file with custom function my_function
is at some path: cwd/custom/pyfile.py
. Then, in your recipe, put: _component_: custom.pyfile.my_function
.
And then, when you use tune
, prepend the following: PYTHONPATH=${pwd}:PYTHONPATH tune ...
This will tell Python to also look in cwd
, which is returned by ${pwd}
.
This should've been fixed in #1760 and #1731. When you run the same command without modifying PYTHONPATH, do you still run into issues? @zjost
Hi, I am new in the field and trying first time to finetune a model. I am working with torchtune on the lora_finetune_single_device . while i was able to do the finetuning using the alpaca built in dataset. Now I am trying to do the fine tuning on a custom dataset. The data is a csv file containing abusive and non abusive tweets and I am trying to fine tune the model on urdu language abuse detection. So one column contains "tweets" other column contains "target" (0,1) . I thought that the instruct_dataset() format would be the most suited format for such a problem. So I created a custom template. I wrote the following code in it: from torchtune.data import InstructTemplate from typing import Mapping, Any, Optional, Dict
class AbusiveLanguageDetectionTemplate(InstructTemplate): template = ( "You are an abusive language detection model for Urdu. Your job is to detect the abusive language in the Urdu sentences. " "Output '1' if the sentence is abusive and output '0' if the sentence is non-abusive. No explanation is required.\n\n" "### Input:\n{tweet}\n\n### Response:\n{target}\n" )
I have saved the code in the file named "abuse_detection.py" which I consider is my custom template. Now I am trying to link this template to my custom_config.yaml file. For the dataset field, I have specified the following things:
Dataset and Sampler
dataset: component: torchtune.datasets.instruct_dataset source: abusive_train.csv template: abuse_detection.AbusiveLanguageDetectionTemplate max_seq_len: 4096
train_on_input: False packed: False batch_size: 2 seed: null shuffle: True where "abusive_train.csv" is the file name of my csv file. Now my custom_config.yaml file, abusive_train.csv file as well as abuse_detection.py file all are located in the same directory and I am running the following command: tune run lora_finetune_single_device --config custom_config.yaml but I am getting the following errror: ModuleNotFoundError("No module named 'abuse_detection'") Are you sure that module 'abuse_detection' is installed? Can someone point to me what I am doing wrong. Where should I place the abuse_detection.py file for it to be picked by the system. Please help.