Closed hage1005 closed 10 months ago
Overall, it looks good to me. I just have two suggestions/opinions.
log_dataloader
returns the log in a format of tuple (i.e. (data_ids, log)), what do you think about enforcing the same thing for analog.get_log()
? This way, we can almost always guarantee that src/tgt ids are provided. In addition, we don't necessarily need src_id
and tgt_id
in InfluenceFunction
as you can directly unpack them from src_log
and tgt_log
.InfluenceFunction
?Let me know what you think!
Overall, it looks good to me. I just have two suggestions/opinions.
- Given that
log_dataloader
returns the log in a format of tuple (i.e. (data_ids, log)), what do you think about enforcing the same thing foranalog.get_log()
? This way, we can almost always guarantee that src/tgt ids are provided. In addition, we don't necessarily needsrc_id
andtgt_id
inInfluenceFunction
as you can directly unpack them fromsrc_log
andtgt_log
.- What do you think about adding functionalities of saving this DataFrame inside
InfluenceFunction
?Let me know what you think!
fix #41
Add
influence_scores
dataframe toInfluence_function
class variable. Updated whenever compute_influence is called. Add a getter function forinfluence_scores