xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.87k stars 135 forks source link

Trouble running training jobs #19

Closed alexmoini closed 1 year ago

alexmoini commented 1 year ago

Hi HKUNLP,

First off, really awesome paper on leveraging instructions to improve embedding quality across domains and tasks.

I am trying to train a model by following the directions to train a model. I downloaded the MTEB dataset, installed the requirements and am running the train job and continue to run into this error: ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

Any idea why this is happening?

Thanks!

Harry-hash commented 1 year ago

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

You may want to download the MEDI data instead of the MTEB data for training a model.

Feel free to add any further question or comment!

alexmoini commented 1 year ago

Hi Harry,

Yeah that was a semantic mistake, I was using MEDI data.

Thanks, Alex

On Tue, Feb 28, 2023 at 6:06 PM Harry-hash @.***> wrote:

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

You may want to download the MEDI data https://drive.google.com/file/d/1vZ5c2oJNonGOvXzppNg5mHz24O6jcc52/view?usp=sharing instead of the MTEB data for training a model.

Feel free to add any further question or comment!

— Reply to this email directly, view it on GitHub https://github.com/HKUNLP/instructor-embedding/issues/19#issuecomment-1449161750, or unsubscribe https://github.com/notifications/unsubscribe-auth/AV7HRZNHOJ5YIOCW6RPA4TLWZ2OK5ANCNFSM6AAAAAAVLI2S34 . You are receiving this because you authored the thread.Message ID: @.***>

Harry-hash commented 1 year ago

Feel free to reopen the issue if there is any further question or comment!