ICanFlyGFC commented 1 year ago

Hello, Pan. Thank you for your open source.

I download checkpoint model from https://acl2021-intergps.s3.us-west-1.amazonaws.com/tp_model_best.pt But the evaluation results are empty. How can I get it back to normal? Thanks.

lupantech commented 1 year ago

Hi, Thank you for your interest in our work!

This evaluation result is not normal. Would you mind sharing the script you were running and the log it printed? It could help me narrow down the reasons.

Thanks!

Best, Pan

ICanFlyGFC commented 1 year ago

Thank you for your reply!

The script is same as yours. I only change the file name of output.

!/usr/bin/env python

coding: utf-8

import json import ast from tqdm import tqdm

import torch from transformers import BartForConditionalGeneration, BartTokenizerFast

def evaluate(diagram_logic_file, text_logic_file, tokenizer_name, model_name, check_point, seq_num):

test_lst = range(2401, 3002)

## read logic form files
with open(diagram_logic_file) as f:
    diagram_logic_forms = json.load(f)
with open(text_logic_file) as f:
    text_logic_forms = json.load(f)

combined_logic_forms = {}
for pid in test_lst:
    combined_logic_forms[pid] = diagram_logic_forms[str(pid)]['diagram_logic_forms'] + \
                                text_logic_forms[str(pid)]['text_logic_forms']

## build tokenizer and model
tokenizer = BartTokenizerFast.from_pretrained(tokenizer_name) # 'facebook/bart-base'
model = BartForConditionalGeneration.from_pretrained(model_name).to(device) # 'facebook/bart-base'
model.load_state_dict(torch.load(check_point))

final = dict()
for pid in tqdm(test_lst):
    input = str(combined_logic_forms[pid])
    tmp = tokenizer.encode(input)
    if len(tmp) > 1024:
        tmp = tmp[:1024]
    input = torch.LongTensor(tmp).unsqueeze(0).to(device)

    output = model.generate(input, bos_token_id=0, eos_token_id=2,
                         max_length=20, num_beams=10, num_return_sequences=seq_num)
    # print(out.size())

    ## refine output sequence
    seq = []
    for j in range(seq_num):
        res = tokenizer.decode(output[j].tolist())
        res = res.replace("</s>", "").replace("<s>", "").replace("<pad>", "")
        # print(res)
        try:
            res = ast.literal_eval(res) # string class to list class
        except Exception as e:
            res = []
        seq.append(res)

    final[str(pid)] = {"id": str(pid), "num_seqs": seq_num, "seq": seq}

return final

if name == 'main':

diagram_logic_file = '../data/geometry3k/logic_forms/diagram_logic_forms_annot.json'
text_logic_file = '../data/geometry3k/logic_forms/text_logic_forms_annot_dissolved.json'

check_point = 'models/tp_model_best.pt'
output_file = 'results/test/pred_seqs_test_debugging.json'

tokenizer_name = 'facebook/bart-base'
model_name = 'facebook/bart-base'

SEQ_NUM = 5

device = torch.device('cuda:0')

result = evaluate(diagram_logic_file, text_logic_file, tokenizer_name, model_name, check_point, SEQ_NUM)

with open(output_file, 'w') as f:
    json.dump(result, f)

The log:

D:\Anaconda\envs\intergps\python.exe D:/WorkSpace/InterGPS-main/theorem_predict/eval_transformer.py 0%| | 0/601 [00:00<?, ?it/s]D:\Anaconda\envs\intergps\lib\site-packages\transformers\generation_utils.py:1839: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). next_indices = next_tokens // vocab_size 22%|██▏ | 135/601 [00:23<01:25, 5.43it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1569 > 1024). Running this sequence through the model will result in indexing errors 100%|██████████| 601/601 [01:42<00:00, 5.88it/s]

Process finished with exit code 0

Thanks!

lupantech commented 1 year ago

Hi,

Below is my script:

cd symbolic_solver
python test.py --label final --strategy final

And the running log is here: https://github.com/lupantech/InterGPS/blob/main/symbolic_solver/logs/final/log-1612098244-predict_low-first_1.log.

The executed result is here: https://github.com/lupantech/InterGPS/blob/main/symbolic_solver/pred_results/final/logic_1612098244-predict_low-first_1.json.

ICanFlyGFC commented 1 year ago

Thank you Pan!

I can run your script to get the corresponding results. But I am focus on theorem predictor. I wonder how to generate ../theorem_predict/results/pred_seq_result_bart_epoch19_seq5.json. I also found that many geometry problems can be solved by rules based on formal language without theorems. Can I understand that theorem prediction is not so important in this paper?

Thanks!

Best, Fucheng

lupantech commented 1 year ago

Hi Fucheng,

For the theorem predictor, you can follow the instructions at https://github.com/lupantech/InterGPS#theorem-predictor.

For the second question, yes. As we discussed in the paper, one of the main functions of the theorem predictor is to improve the search efficiency and thus improve the final accuracy, which is verified in Table 7 and Figure 5.

Best, Pan

ICanFlyGFC commented 1 year ago

Thanks, Pan!

I follow the instruction at https://github.com/lupantech/InterGPS#theorem-predictor. I download the pre-trained model at step 4. But the evaluation results are empty in step 5.

Thanks!

Best, Fucheng

lupantech commented 1 year ago

Hi Fucheng,

I see. Would you mind if I checked your issue a few days later? I am working on some emergent deadlines and I need more time to figure your problem out. For now, I think it is not a big problem to ignore the theorem predictor if you just want to reproduce our results.

I appreciate your understanding!

Best, Pan

ICanFlyGFC commented 1 year ago

Thanks, Pan.

Sure. Thank you for your work and look forward to your new achievements. Your paper and code have inspired me a lot.

Best, Fucheng

lupantech commented 1 year ago

Hi Fucheng,

Thanks! I am happy to help with your project as well!

Yours sincerely, Pan

lupantech / InterGPS

Poor performance of theorem predictor #10

!/usr/bin/env python

coding: utf-8