logix-project / logix

AI Logging for Interpretability and Explainability🔬
Apache License 2.0
86 stars 6 forks source link

Add pandas dataframe format #57

Closed hage1005 closed 10 months ago

hage1005 commented 10 months ago

fix #41

Add influence_scores dataframe to Influence_function class variable. Updated whenever compute_influence is called. Add a getter function for influence_scores

sangkeun00 commented 10 months ago

Overall, it looks good to me. I just have two suggestions/opinions.

  1. Given that log_dataloader returns the log in a format of tuple (i.e. (data_ids, log)), what do you think about enforcing the same thing for analog.get_log()? This way, we can almost always guarantee that src/tgt ids are provided. In addition, we don't necessarily need src_id and tgt_id in InfluenceFunction as you can directly unpack them from src_log and tgt_log.
  2. What do you think about adding functionalities of saving this DataFrame inside InfluenceFunction?

Let me know what you think!

hage1005 commented 10 months ago

Overall, it looks good to me. I just have two suggestions/opinions.

  1. Given that log_dataloader returns the log in a format of tuple (i.e. (data_ids, log)), what do you think about enforcing the same thing for analog.get_log()? This way, we can almost always guarantee that src/tgt ids are provided. In addition, we don't necessarily need src_id and tgt_id in InfluenceFunction as you can directly unpack them from src_log and tgt_log.
  2. What do you think about adding functionalities of saving this DataFrame inside InfluenceFunction?

Let me know what you think!

  1. Yeah I was also thinking about enforcing the same (data_id, log) format, I think It would be great to add
  2. I think the saving functionality is nice to have