Open diazr04 opened 1 month ago
The tokenizer has right padding hence the !!! parts I think - was this the main concern?
The tokenizer has right padding hence the !!! parts I think - was this the main concern?
Yes exactly my predictions looks like this: !!!!!!!!!!!!!!!!!!!!(RANDOM TEXT) !!!!!!!!!!!!!!
I finetuned on my dataset of just 810 (Kind of Q/A) is this behavior because my dataset is not so 'big'?
Because since I am doing some scientific research, I fine-tune on an specific dataset to make predictions, I was wondering if my dataset is not good enough.
Thanks for your reply.
Hello, I was training Llama 3.1 (instruct) on my dataset. I am using this code I created:
Unfortunately, I when I print the decoded labels form the model to compute my metrics, I get just nonsense: Predictions type: <class 'numpy.ndarray'>, shape: (179, 1064, 128256) Labels type: <class 'numpy.ndarray'>, shape: (179, 1064) Predictions after argmax (if applied), shape: (179, 1064) Predictions after tolist: <class 'numpy.ndarray'>, length: 179 First prediction sample: [ 14924 128006 128007 271 40 315 220 11 320 27560] First few decoded predictions: ['Question\n\nI of, ( Cu Type: Alloy, Crystal Motif: Face, Crystal Structure: Face, Morphology: NAocrar, Size:, Shape: NA-shaped,assistant\n\nA 1: Synolve 0 mg of Auachecyl trim (HDA) in 10 mL of toionized water. a 10 ml glass-bottom flaskial. Step 2: Addicate the H for 30 minutes to theDA is fully dissolved. Step 3: Add 1.5 ml of uCl4 (50.1 M) to 0.3 ml of CuCl2 (0.1 M) toous solutions to the vial. son stirring. Step 4: Continue the mixture at to 100°CC for a oil bath under Step 5: Maintain few minutes, the mixture turns dark color to blue yellow, indicating the to turnss0.3 ml of prepared DT+)-glucose (0 M) intoous solution into the v mixture. Step 6: Continue the vial and let it mixture for 30 minutes. Step 7: Remove the temperature to 120 °C and maintain heat the 30 minutes minutes. Step 8: Remove the mixture to cool down to room temperature. Step 9: Centify the product nan solutionrown precipCu nan by times by deone and ethanol to centrifugation. 100 rpm for 10 minutes. Step 10: Collect-disperse the Auitate Au in. Step TEM preparation of Au Auostars, Step 1: Diss 1.5 ml of HAuCl4 (0.1 M) and 0.3 ml of CuCl2 (0.05 M) aque. a 10 ml round bottom vial. Step 2: Add 0 ml of ethanolleylamine (OLA) to the vial under Step 3: Heat the mixture up an oil bath at 150 °C for 30 hours. Step 4: After the mixtureidal solution to room temperature. Step 5: Pur few of oform and ethanol ( the v and centrif centrifuge at 6000 rpm for remove the imp OLA. Step 6: Re-disperse the precipitated particles in ethanol. Step!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!', 'Question\n\nIassistant\n\nIt is no words instructions reactions for. the prompt text. I be used into a instructions concise-by-step instructions. the catals.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'] First few decoded labels: ['user\n\nComposition: AuCu, Material Type: Metal, Structural Motif: NA, Crystal Structure: NA, Morphology: nanostars, Size: NA, Shape: star-shapedassistant\n\nStep 1: Dissolve 45 mg of hexadecylamine (HDA) into 4 ml of deionized water in a 40 ml round bottom vial. Step 2: Sonicate the mixture for 30 minutes until HDA is completely dissolved. Step 3: Add 0.3 ml of HAuCl4 (0.1 M) and 0.3 ml of CuCl2 (0.1 M) aqueous solutions to the vial under magnetic stirring. Step 4: Heat the mixture up to 100 °C in an oil bath. Step 5: After a few minutes, the solution turns from green to light-blue, and then quickly inject 0.3 ml of freshly prepared D-(+)-glucose (1 M) aqueous solution into the hot mixture. Step 6: Cap the vial and stir the mixture for 30 minutes. Step 7: Increase the temperature to 150 °C and further stir for 10 more minutes. Step 8: Allow the mixture to cool down to room temperature. Step 9: Purify the resulting purple-brown AuCu solution several times with acetone and ethanol by centrifugation at 6000 rpm for 5 minutes. Step 10: Re-disperse the precipitated particles in ethanol. For the synthesis of rounded nanostars: Step 1: Mix 0.3 ml of HAuCl4 (0.05 M) and 0.3 ml of CuCl2 (0.1 M) in ethanol in a 40 ml round bottom vial. Step 2: Add 4 ml of oleylamine (OLA) to the vial. Step 3: Heat the mixture in an oil bath at 130 °C for 2 hours. Step 4: Cool the colloidal solution to room temperature. Step 5: Add a mixture of chloroform and ethanol to the solution and then centrifuge at 3000 rpm to remove any residual OLA. Step 6: Re-disperse the precipitated nanoparticles in ethanol.', 'user\n\nassistant\n\nThere are no specific synthesis procedures mentioned in the provided text that can be converted into clear, step-by-step instructions for heterogeneous catalyst synthesis.'] Calculated metrics: {'rouge1': 68.79999377501494, 'rouge2': 38.45706763890132, 'rougeL': 58.38013880947452, 'rougeLsum': 59.16003761551416} {'eval_loss': 1.4974863529205322, 'eval_rouge1': 68.79999377501494, 'eval_rouge2': 38.45706763890132, 'eval_rougeL': 58.38013880947452, 'eval_rougeLsum': 59.16003761551416, 'eval_runtime': 266.1983, 'eval_samples_per_second': 0.672, 'eval_steps_per_second': 0.672, 'epoch': 0.56} Pérdida de validación en el paso 50: 1.4974863529205322.
Any of you can help me? Do you know how can I Finetune the base model?
Thank you.