Inconsistencies in Reported Dimensions and Configuration Files

fvarno commented 6 months ago

In Table 1 of Gemma Technical Report Feedforward hidden dims are listed as 32768 and 49152 for the 2B and 7B models, respectively. However, these figures do not align with the numbers provided in the configuration files for the for 7B model and 2B model. This discrepancy leads me to wonder whether I am comparing the incorrect figures, if there is an error in the report, or if the experiments were conducted using different configuration files. Should the numbers in the technical report require revision, it would also be necessary to update the reported total number of parameters accordingly.

pengchongjin commented 6 months ago

The feed forward hidden dims in the table of this tech report is the sum of hidden dim of gate projection and up projection, which is 2X of intermediate_size in the code.

Hope that explains.

fvarno commented 6 months ago

Your explanation, combined with reviewing the code, has clarified what I was missing. Thank you for the clarification.

google / gemma_pytorch

Inconsistencies in Reported Dimensions and Configuration Files #2