StAR inference (reproducibility of test_head_full_scores.list file)

martinsvat commented 2 years ago

Hi, I've been working with your codebase a while but there is one issue, which occurred along the path and I haven't been able to overcome it so far. For a start, my usage of StAR (ensemble, and others) is a little bit different. I don't work with batches, rather than that I need one single object (let's say EnsembeModel or StarModel) which I query using a triple and get a probability value of that particular triple in return.

So far, I managed to dig through the codebase to do the things I need, but when I started verifying my implementation with your results, I failed to obtain the same results. So, please, how can I replicate those values that are in StAR model, e.g. WN18RR_roberta-large/test_head_full_scores.list ?

I got that I should do something like

with torch.no_grad():
  logits = model.classifier(_rep_src, _rep_tgt)
  logits = torch.softmax(logits, dim=-1)
  local_scores = logits.detach().cpu().numpy()[:, 1]
  resultForTheTriple = local_scores[0]

Right?

I've been following the method get_ensemble_data.py:get_scores which is buggy (namely because of _id2ent_list = dataset.id2entlist), and secondly, it is not invoked anywhere in the codebase (I haven't found such invocation :( hope I am mistaken). Anyhow, when I use this method to get the result for the first test-query (['06845599', '_member_of_domain_usage', '03754979']) in the WN18RR dataset I get something like 0.017418645322322845 which is not the value stored in WN18RR_roberta-large/test_head_full_scores.list (actually, loading this file, e.g. l = toch.load("test_head_full_scores.list"), and then inferring the value, e.g. l[0][1][l[0][0]], yields something like 0.9994868. Secondly, when I tried to implement the same method on my own, I ran into indeterminism, e.g. running the exact same script resulted in different scores (for the first test-query mentioned above). I checked that the embeddings are the same in both places (they are, I'm using your loading of embeddings), but the output of the model, e.g. model.classifier(_rep_src, _rep_tgt), differ every time. Have you seen anything like this? For example, is there some drop-out or something similar in (Ro)Berta model that should be set up prior to evaluation besides "model.eval()"?

So, you see that I actually want to produce a file with StAR scores on my own (e.g. WN18RR_roberta-large/test_head_full_scores.list) but am unable to do so (and deterministically). Please, can you point me to the place in the code where this is happening or say where I made any mistake? I would appreciate it all :) Thanks a lot.

best, Martin

martinsvat commented 2 years ago

Btw: My assumption is that there is already a trained and stored StAR model from the previous execution. So I basically use "link_prediction.train" as the place for setting up things. Now, I'm looking that there is no "--init", so I guess that

    config = config_class.from_pretrained(
        args.config_name if args.config_name else args.model_name_or_path,
    )
    config.distance_metric = args.distance_metric
    config.hinge_loss_margin = args.hinge_loss_margin
    config.pos_weight = args.pos_weight
    config.loss_weight = args.loss_weight
    config.cls_loss_weight = args.cls_loss_weight

    tokenizer = tokenizer_class.from_pretrained(
        args.tokenizer_name if args.tokenizer_name else args.model_name_or_path,
        do_lower_case=args.do_lower_case)
    model = model_class.from_pretrained(args.model_name_or_path, config=config)

loads a stored model. Right?

best, MS

wangbo9719 commented 2 years ago

Thanks for your attention and sorry for the confusing code.

The way you used to load the model is right. For other problems, I'm confused also. Because the original code for this work is on a server that I can't connect to recently. I will check the code if I'm free. And please correct me if you find the reason.

martinsvat commented 2 years ago

Hi, thanks for such a quick response. Ok, but I can go through code only in this repository anyway.

Can you please point me which command (from the readme page) generates those 'test_head_full_scores.list' files for StAR models (even which method is responsible for that would help). I might debug it somehow, but as I wrote earlier, I could not find which part of the code generates the files.

best, Martin

wangbo9719 / StAR_KGC

StAR inference (reproducibility of test_head_full_scores.list file) #9