Preprocess step too slow

I am trying to encode one short string into an embedding. But it takes 3.8 seconds to execute!

fclip.encode_text(['this is a photo of a red shoe'], batch_size=32)

While trying to debug, I found that that one line in the fclip.encode_text function takes up all the time:

    def encode_text(self, text: List[str], batch_size: int):
        dataset = Dataset.from_dict({'text': text})

        # this line
        dataset = dataset.map(lambda el: self.preprocess(text=el['text'], return_tensors="pt",
                                                         max_length=77, padding="max_length", truncation=True),
                              batched=True,
                              remove_columns=['text'])

       ...

Rather than using dataset.map function, if I just use a for loop around self.preprocess it completes within 20 milliseconds!

I understand this is probably an issue of Datasets library (version 2.0.1). I just wanted to know if anyone else has faced this issue and if there is a simple solution here which I am probably missing.

patrickjohncyh / fashion-clip

Preprocess step too slow #30