unitaryai / detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.
https://www.unitary.ai/
Apache License 2.0
935 stars 114 forks source link

Progress Bar #74

Open vahidthegreat opened 1 year ago

vahidthegreat commented 1 year ago

Can you add a progress bar feature too?

laurahanu commented 1 year ago

Hello, what would be the use case of a progress bar? If running it on a batch of text, you can get a quick progress bar by using tqdm in your for loop.

vahidthegreat commented 1 year ago

That's in case I wanna query sentences one by one, which is not good because it slows the machine down as the model does initialization for each sentence. The faster way would be to query an array of sentences. For that a progress-bar is useful.

laurahanu commented 1 year ago

It shouldn't initialise the model for each sentence, are you defining the model first and then do the prediction for each sentence? e.g.

model = Detoxify("unbiased")
for batch in tqdm(data_batches):
    results = model.predict(batch)
vahidthegreat commented 1 year ago

IDK. I had seen in other hugging face models that each batch starts with a low speed and then the progress bar goes more fastly when it moves more to the end (for each batch). Thus, I think if we only have one collective batch, the overall speed will be higher (?)

laurahanu commented 1 year ago

Depends on how big your batch is and how much you can fit into your memory, if it's large enough it might be more efficient to do it in smaller batches. Are you doing this on cpu or gpu?