kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
https://discord.gg/qUtxnK2NMf
MIT License
1.69k stars 155 forks source link

I wonder what hardware conditions (GPU) the code uses, and why the loss value has been above 5.2 after running the train.py file, and the validation generated unreadable incomprehensible content. #68

Open Dayun0925 opened 2 weeks ago

Dayun0925 commented 2 weeks ago

Describe the bug A clear and concise description of what the bug is and what the main root cause error is. Test very thoroughly before submitting.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

Upvote & Fund

Fund with Polar