google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models
https://ai.google.dev/gemma
Apache License 2.0
5.24k stars 499 forks source link

Inconsistencies in Reported Dimensions and Configuration Files #2

Closed fvarno closed 6 months ago

fvarno commented 6 months ago

In Table 1 of Gemma Technical Report Feedforward hidden dims are listed as 32768 and 49152 for the 2B and 7B models, respectively. However, these figures do not align with the numbers provided in the configuration files for the for 7B model and 2B model. This discrepancy leads me to wonder whether I am comparing the incorrect figures, if there is an error in the report, or if the experiments were conducted using different configuration files. Should the numbers in the technical report require revision, it would also be necessary to update the reported total number of parameters accordingly. image

pengchongjin commented 6 months ago

The feed forward hidden dims in the table of this tech report is the sum of hidden dim of gate projection and up projection, which is 2X of intermediate_size in the code.

Hope that explains.

fvarno commented 6 months ago

Your explanation, combined with reviewing the code, has clarified what I was missing. Thank you for the clarification.