huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.1k stars 27.03k forks source link

Accelerate x Trainer issue tracker: #33345

Open ArthurZucker opened 2 months ago

ArthurZucker commented 2 months ago

A bunch of issues are a bit stale, and @SunMarc + @muellerzr are a bit short on bandwidth! Thus we would love to have community support to solve the following:

Help needed

Feature request

Replied with potential fix and following

WizKnight commented 2 months ago

@ArthurZucker Hey there!👋 I'm new to this repository and excited to learn and contribute. Please let me know if there are any good starting points or tasks where I can be of assistance.

ArthurZucker commented 2 months ago

Any of these issue that have the Good First Issues should be fairly easy! 🤗

irislin1006 commented 2 months ago

Hi @ArthurZucker, I'm a first time contributor, but I would love to take issue https://github.com/huggingface/transformers/issues/31734 as a start 👍

[Update on 202409/07] Handled and replied in the issue

nnilayy commented 2 months ago

Hi there👋 @ArthurZucker, Handled issue #31439, hope that helps🤗.

WizKnight commented 2 months ago

Hi there👋 @ArthurZucker, I'll handle the issue #28124

irislin1006 commented 2 months ago

Hi there👋 @ArthurZucker, I would like to take https://github.com/huggingface/transformers/issues/32312 😀

[Update on 202409/09] Handled and replied in the issue

SunMarc commented 2 months ago

cc @matthewdouglas

godspeed5 commented 2 months ago

I had opened PR #31268 as a fix for issue #30819. I think some discussion is needed on there @amyeroberts

WizKnight commented 2 months ago

Hey @amyeroberts, just wanted to check in on issue #28124. It seems like @muellerzr already tackled it with his fix in #30169. Should I still work on this further, or is it good to go as is?

Thanks!

amyeroberts commented 2 months ago

Hi @WizKnight - best to ask @muellerzr (ideally on the relevant PR / issue to avoid pinging everyone here) on the status of those. I can see in #30169 the PR wasn't merged in due to inactivity -- pending a response to these questions..

In general, if something has just been closed by the github stale bot and not because of a clear decision not to pursue the PR / a clear rejection from the review process you're free to pick up the work :)

SunMarc commented 1 month ago

cc @mekkcyber

P-Potdar commented 1 month ago

Hey @SunMarc and @muellerzr,

I'd love to contribute to this project and help resolve some of the issues mentioned here, especially the DeepSpeed Zero3-related bugs. I’ve already gone through some of the issues and identified potential starting points for solutions. I'll be focusing on these:

Training hangs at the first gradient syncing of an MoE model while using DeepSpeed (#30911) Trainer doesn't save evaluation metrics (#33733) CUDA RuntimeError: Unspecified Launch Failure during Training (#30913) I'll submit PRs with proposed fixes and updates soon. Thank you for the opportunity to contribute!

Also, if there are any specific guidelines or areas where help is most needed, feel free to point me in the right direction!

Looking forward to collaborating on this during Hacktoberfest 🎉

b423016 commented 1 month ago

Hey @SunMarc and @muellerzr I would be happy to contribute the issue Trainer doesn't save evaluation metrics (#33733 )

ArthurZucker commented 1 month ago

Awesome! Just added the tag to make sure it works for everyone! 🥳

Thejaggeddevil commented 1 month ago

i want to work on #29348 please assign this to me

eeshan15 commented 1 month ago

hey @ArthurZucker Will this be counted in hacktoberfest?

ArthurZucker commented 3 weeks ago

Yes, given that there is the tag! We don't assign issue, first PR that is up will be reviewed, if stale anyone can take it, if no PR is linked, you can also create one 🤗