Item and User Normalization

I trained my LightFM model on user and item features along with interactions. I noticed some things which didnt make sense to me, So I'm hoping someone would make me understand. I am using the following code for splitting, training and evaluation:

train, test = random_train_test_split(interactions, test_percentage=0.2, random_state=np.random.RandomState(5))
train_weights, test_weights = random_train_test_split(weights, test_percentage=0.2, random_state=np.random.RandomState(5))

#model training
model.fit(train,
      user_features=user_features,
      item_features= item_features,
      sample_weight= train_weights,
      epochs=10)

# Evaluate the model on the test set using auc
auc = auc_score(model,
                      test,
                      user_features=user_features,
                      item_features=item_features,
                     ).mean()

The things I need to understand is the effect of user and item normalization on the evaluation. Following is the code :

user_features = dataset.build_user_features(User_df['features'].tolist(), normalize= True)

item_features = dataset.build_item_features(Product_df['features'].tolist(), normalize= True)

Things which are confusing:

When I put normalize =False for both, then I get AUC approx 84%.
When I put normalize =True for both, then I get AUC approx 99%.
When I normalize only user features, the AUC is still 99%.
When I normalize only item features, the AUC is still 99%.
When I put normalize =False for user and exclude item features from training, the AUC is still 99%.
When I put normalize =False for user and exclude user features from training, the AUC is 84%.

I'd like to know if normalization can have this much effect and if above scenarios make any sense. Also additionally, AUC of 99% seems too good to be true.

lyst / lightfm

Item and User Normalization #697