sihaoevery / lambda_vit

Other
17 stars 3 forks source link

About calculating transfer entropy details #3

Open HongyuZhu-s opened 1 month ago

HongyuZhu-s commented 1 month ago

Hello! Why doesn't the code include the log(2π) part in the formula for calculating entropy in 'model_shrink.py'? This is a bit different from equation (5) in the paper.

yuwentao88 commented 1 month ago

The author has been answered the question in the article,‘𝐻(𝐹) is propor- tional to log(𝜎) plus two additional constants. Without loss of generality, the two constants are neglected in the follow- ing analysis.’

yuwentao88 commented 1 month ago

Do you find the calculation code for transfer entropy?

sihaoevery commented 1 month ago

Hi,

Sorry if you misunderstood the calculation of transfer entropy. The code in 'model_shrink.py' is the result of entropy. The transfer entropy is calculated via Eq. (7). These constant terms can be neglected by subtraction. You may refer to the discussion in #1.

Best,

Sihao

HongyuZhu-s commented 1 month ago

The author has been answered the question in the article,‘𝐻(𝐹) is propor- tional to log(𝜎) plus two additional constants. Without loss of generality, the two constants are neglected in the follow- ing analysis.’

Thank you for your reply! I overlooked that question.

sihaoevery commented 1 month ago

Further to my previous answer, here are a few additional details.

  1. You may obtain the transfer entropy by subtraction: ( entropy of complete model - entropy given some attention layers are skipped)

  2. The entropy results are appended in an array (e.g. self.trans_entropy_head). When the population of the array is adequate, it will return the result. In our implementation, we randomly sample 50,000 images from the training set. Since the batch size is 384 in inference mode, the result at 50,000/384=130 iteration is printed.

HongyuZhu-s commented 1 month ago

Further to my previous answer, here are a few additional details.

  1. You may obtain the transfer entropy by subtraction: ( entropy of complete model - entropy given some attention layers are skipped)
  2. The entropy results are appended in an array (e.g. self.trans_entropy_head). When the population of the array is adequate, it will return the result. In our implementation, we randomly sample 50,000 images from the training set. Since the batch size is 384 in inference mode, the result at 50,000/384=130 iteration is printed.

Thank you for your careful answer. I also have a question: Does the transfer entropy size or rank of a certain layer change with the increase of epochs?

HongyuZhu-s commented 1 month ago

Do you find the calculation code for transfer entropy?

Hello, May I ask if we need to compare the entropy difference when calculating the transfer entropy, does this require training the model twice?