likejazz / llama3.np

llama3.np is a pure NumPy implementation for Llama 3 model.
MIT License
958 stars 73 forks source link

A few suggestions #1

Closed 99991 closed 4 months ago

99991 commented 4 months ago

You can improve the performance of the tokenizer by using Python's heappush and heappop functions. For example, see this tokenizer for my simplified TinyLlama implementation: https://github.com/99991/SimpleTinyLlama/blob/9af6f7df6e12d8478a90d3cd5c8e8c1a95fce0fe/tokenizer.py#L96

Instead of silencing warnings, you could clip the magnitude of the inputs to the sigmoid function as in my NumPy CLIP implementation: https://github.com/99991/NumPyCLIP/blob/main/numpyclip.py#L113-L116

I'd also recommend to store the weights somewhere else (for example as a release or on HuggingFace) and download them on demand, because GitHub has really low bandwidth quota.

I very much appreciate your shape annotations. They make understanding much easier. Great work!

likejazz commented 4 months ago

Thank you to your suggestions!

  1. This implementation is not focused on performance, but your suggestion of a heap structure is appreciated. I'll be sure to incorporate it in the future.
  2. I used to set_printoptions() to disable scientific notation to make it more readable. I used it to debug the values, but I don't use it anymore, so I removed it.
  3. I uploaded the model directly to GitHub for convenience. The file size was 85MB, so it wasn't too big. But in general, your suggestion is correct.
99991 commented 4 months ago
  1. Oh, you are right. I was mistaken and thought the code said something different. Nevermind!
  2. I had a similarly large project in the past and GitHub would frequently block downloads whenever the quota of just 5 GB was exceeded. But I agree, it is more convenient this way.