ltl3A87 / KB-BINDER

Few-shot In-context Learning for Knowledge Base Question Answering
MIT License
59 stars 8 forks source link

Low performance on WebQSP #7

Open YaooXu opened 7 months ago

YaooXu commented 7 months ago

I try to run KB-BINDER on WebQSP with this command:

python3 few_shot_kbqa.py --shot_num 40 --temperature 0.3 \
 --api_key xxxxx --engine gpt-3.5-turbo \
 --train_data_path ./data/GrailQA/webqsp_0107.train.json --eva_data_path ./data/GrailQA/webqsp_0107.test.json \
 --fb_roles_path data/fb_roles --surface_map_path ./data/GrailQA/surface_map_file_freebase_complete_all_mention

But I found that the em_score is always zero and all results are no_ans. image

Is there any problem with the codes? Could you please provide your running logs or the final predicted SPARQL queries?

YaooXu commented 7 months ago

I found it is the error in from_fn_to_id_set, but the performance is still low after fixing the bug, besides, do the codes calculate F1 score that reported in the paper? I only find em_score in the log. image

divinerIU commented 6 months ago

@YaooXu I meet same problem with you,can you tell me about your solution in from_fn_to_id_set ? thanks! here is my running log.

2024-04-23 21:48:49 INFO     data[question]: what languages are there in japan
2024-04-23 21:48:49 INFO     data[exp]: (JOIN (R location.country.languages_spoken) m.03_3d)
2024-04-23 21:48:56 INFO     gene_type: Comparison
2024-04-23 21:49:08 INFO     gene_exps: ['To find out what languages are spoken in Japan, you can use the following logical form:\n\n```\n(JOIN (R location.country.languages_spoken) japan)\n```\n\nThis should provide you with information about the languages spoken in Japan.']
2024-04-23 21:49:08 INFO     gene_exp: To find out what languages are spoken in Japan, you can use the following logical form:
(JOIN (R location.country.languages_spoken) japan)
This should provide you with information about the languages spoken in Japan.
2024-04-23 21:49:08 INFO     fn_org: what languages are in the japan
2024-04-23 21:49:32 INFO     sub fn: are languages
2024-04-23 21:49:32 INFO     fn_org: the languages in
2024-04-23 21:49:46 INFO     sub fn: the languages
2024-04-23 21:49:46 INFO     all_iters: [('m.06j14k', 'm.06pfgnz'), ('m.06f07b', 'm.06pfgnz')]
2024-04-23 21:49:46 INFO     mid_iters: ('m.06j14k', 'm.06pfgnz')
2024-04-23 21:49:46 INFO     predicted_answer: None
2024-04-23 21:49:46 INFO     label: ['m.02h40lc', 'm.02hwhyv', 'm.02jcw', 'm.03_9r', 'm.0jdg7']
2024-04-23 21:49:46 INFO     ================================================================
2024-04-23 21:49:46 INFO     consistent candidates number: 1
2024-04-23 21:49:46 INFO     em_score: 0.01327433628318584
2024-04-23 21:49:46 INFO     correct: 6
2024-04-23 21:49:46 INFO     total: 452
2024-04-23 21:49:46 INFO     no_ans: 438
2024-04-23 21:49:46 INFO      
2024-04-23 21:49:46 INFO     ================================================================
2024-04-23 21:49:46 INFO     ==========
2024-04-23 21:49:46 INFO     data[id]: WebQTest-639
2024-04-23 21:49:46 INFO     data[question]: what year did the bulls get rodman
2024-04-23 21:49:46 INFO     data[exp]: (JOIN (R sports.sports_team_roster.from) (AND (JOIN (R sports.pro_athlete.teams) m.01ztgm) (JOIN sports.sports_team_roster.team m.0jm74)))
2024-04-23 21:49:51 INFO     gene_type: Comparison
2024-04-23 21:50:02 INFO     gene_exps: ['The logical form for "what year did the Bulls get Rodman" would be:\n\n```\n(JOIN (R sports.trade.traded_to) (TC (JOIN (R sports.trade.traded_player) rodman) sports.trade.to (JOIN (R sports.team_roster.players) (JOIN (R sports.sports_team.roster) (JOIN (R sports.sports_team_roster.team) bulls)) sports.team_roster.from (DATE 1995))))\n```\n\nThis logical form captures the query by looking for the year when the Chicago Bulls acquired Dennis Rodman through a trade.']
2024-04-23 21:50:02 INFO     gene_exp: The logical form for "what year did the Bulls get Rodman" would be:
(JOIN (R sports.trade.traded_to) (TC (JOIN (R sports.trade.traded_player) rodman) sports.trade.to (JOIN (R sports.team_roster.players) (JOIN (R sports.sports_team.roster) (JOIN (R sports.sports_team_roster.team) bulls)) sports.team_roster.from (DATE 1995))))
This logical form captures the query by looking for the year when the Chicago Bulls acquired Dennis Rodman through a trade.
2024-04-23 21:50:02 INFO     fn_org: the year did the bulls get rodman
2024-04-23 21:50:29 INFO     sub fn: rodman rodman
2024-04-23 21:50:29 INFO     fn_org: the the year the bulls rodman a
2024-04-23 21:50:56 INFO     sub fn: the bulls
2024-04-23 21:50:56 INFO     all_iters: [('m.0gchd0w', 'm.0jm74', 'm.0rl8mmf'), ('m.0gchd0w', 'm.0jm74', 'm.0lcp5z9'), ('m.0gchd0w', 'm.0jm74', 'm.0slvk1r'), ('m.0gchd0w', 'm.0jm74', 'm.0lqlkph'), ('m.0gchd0w', 'm.0jm74', 'm.04j4r1x'), ('m.0gchd0w', 'm.0jm3v', 'm.0rl8mmf'), ('m.0gchd0w', 'm.0jm3v', 'm.0lcp5z9'), ('m.0gchd0w', 'm.0jm3v', 'm.0slvk1r'), ('m.0gchd0w', 'm.0jm3v', 'm.0lqlkph'), ('m.0gchd0w', 'm.0jm3v', 'm.04j4r1x'), ('m.0gchd0w', 'm.03s8ty', 'm.0rl8mmf'), ('m.0gchd0w', 'm.03s8ty', 'm.0lcp5z9'), ('m.0gchd0w', 'm.03s8ty', 'm.0slvk1r'), ('m.0gchd0w', 'm.03s8ty', 'm.0lqlkph'), ('m.0gchd0w', 'm.03s8ty', 'm.04j4r1x'), ('m.0gchd0w', 'm.0jmbv', 'm.0rl8mmf'), ('m.0gchd0w', 'm.0jmbv', 'm.0lcp5z9'), ('m.0gchd0w', 'm.0jmbv', 'm.0slvk1r'), ('m.0gchd0w', 'm.0jmbv', 'm.0lqlkph'), ('m.0gchd0w', 'm.0jmbv', 'm.04j4r1x'), ('m.0gchd0w', 'm.0jm2v', 'm.0rl8mmf'), ('m.0gchd0w', 'm.0jm2v', 'm.0lcp5z9'), ('m.0gchd0w', 'm.0jm2v', 'm.0slvk1r'), ('m.0gchd0w', 'm.0jm2v', 'm.0lqlkph'), ('m.0gchd0w', 'm.0jm2v', 'm.04j4r1x')]
2024-04-23 21:50:56 INFO     mid_iters: ('m.0gchd0w', 'm.0jm74', 'm.0rl8mmf')
2024-04-23 21:50:56 INFO     predicted_answer: None
2024-04-23 21:50:56 INFO     label: ['1995']
2024-04-23 21:50:56 INFO     ================================================================
2024-04-23 21:50:56 INFO     consistent candidates number: 1
2024-04-23 21:50:56 INFO     em_score: 0.013245033112582781
2024-04-23 21:50:56 INFO     correct: 6
2024-04-23 21:50:56 INFO     total: 453
2024-04-23 21:50:56 INFO     no_ans: 439
2024-04-23 21:50:56 INFO      
YaooXu commented 6 months ago

The bug I meet before was not loading BM25 correctly, and it was my config error, I didn't change the source codes. This is my log for the same question. It seems that there is no from fn_org to sub_fn in my case.

2024-04-24 11:01:29 INFO     ==========
2024-04-24 11:01:29 INFO     data[id]: WebQTest-638
2024-04-24 11:01:29 INFO     data[question]: what languages are there in japan
2024-04-24 11:01:29 INFO     data[exp]: (JOIN (R location.country.languages_spoken) m.03_3d)
2024-04-24 11:01:29 INFO     top_score: 12.395286103714133
2024-04-24 11:01:29 INFO     top related questions: ['what languages are there in switzerland', 'how many languages are there in the philippines', 'what is there to see in barcelona', 'what government did japan have', 'what time zones are there in the us', 'what is there to do in palm springs', 'what is there to do in montpelier vt', 'what is there fun to do in san diego', 'what time is it in japan 24 hour clock', 'what countries does japan export to', 'what country does japan export to', 'what is the current leader of japan', 'what is language in argentina', 'what is there to do around austin texas', 'what language is spoken in singapore', 'what type of government does japan currently have', 'what kind of voting system does japan have', 'what is there to see near the grand canyon', 'what language they speak in taiwan', 'what language people speak in afghanistan', 'what language people speak in belgium', 'what language do they in belgium', 'what is tibetan language', 'what is the language spoken in brazil', 'what is the language used in indonesia', 'what is the language called in turkey', 'what is the dominant language in israel', 'what is the language called in russia', 'what language is mainly spoken in england', 'what is the official language in china', 'what are the major languages in italy', 'what are the official languages in spain', 'how many teams are there in the ncaa football', 'how much mlb teams are there', 'who is the leader of japan', 'what language does people in thailand speak', 'what language they speak in the philippines', 'what language does people in netherlands speak', 'what language do the speak in pakistan', 'what language do they speak in indonesia', 'what language did ancient romans write in', 'what language do they speak in egyptian', 'what language do you speak in austria', 'what language do people speak in iceland', 'what language do they speak in thai', 'what language does people in france speak', 'what language do people speak in iran', 'what language does people speak in australia', 'what language do you speak in iran', 'what language do you speak in finland', 'what languages do they speak in russia', 'what language do u speak in egypt', 'what language do the speak in switzerland', 'what language do people speak in turkey', 'what languages do people speak in egypt', 'what languages do people speak in switzerland', 'what language do chinese', 'what jamaican language called', 'what language brazil use', 'what language chile speak', 'what language do denmark', 'what language does fiji', 'what are the main languages spoken in spain', 'what are the major languages spoken in italy', 'what is the major language spoken in greece', 'what is the language they speak in jamaica', 'what is the major language spoken in canada', 'what is the main language spoken in switzerland', 'what is the main language spoken in italy', 'where do most of the people live in japan', 'what is the bosnian language', 'what language is cyprus using', 'what is modern egyptian language', 'how much indiana jones movies are there', 'who is prime minister of japan 2011', 'who is the political leader of japan', 'who is prime minister of japan 2012', 'what language do they speak in northern ireland', 'what language do people speak in brazil wikipedia', 'what language do people in czech republic speak', 'what language do the people in ghana speak', 'what language do people speak in costa rica', 'what language do the people speak in australia', 'what language do they speak in spain wikipedia', 'what main language do they speak in brazil', 'what language do they speak in argentina yahoo', 'what language do they speak in iceland wikipedia', 'what language did the egyptians', 'what language do haitian speak', 'what language does greece use', 'what language do brazil speak', 'what language do they denmark', 'what language do argentina use', 'what language does egyptians use', 'what language do jewish speak', 'what language do cyprus speak', 'what language do egyptians use', 'what language do they speak in guyana south america', 'what kind of language do they speak in china', 'what kind of language do they speak in greece']
2024-04-24 11:01:32 INFO     gene_exps: ['(JOIN (R location.country.languages_spoken) japan)', '(JOIN (R location.country.languages_spoken) japan)', '(JOIN (R location.country.languages_spoken) japan)', '(JOIN (R location.country.languages_spoken) japan)', '(JOIN (R location.country.languages_spoken) japan)', '(JOIN (R location.country.languages_spoken) japan)', '(JOIN (R location.country.languages_spoken) japan)']
2024-04-24 11:01:32 INFO     gene_exp: (JOIN (R location.country.languages_spoken) japan)
2024-04-24 11:01:34 INFO     all_iters: [('m.016f2t',), ('m.01shgy',), ('m.016f2t',), ('m.01shgy',), ('m.0zjdj1p',), ('m.0m1ls96',), ('m.012qn9tf',), ('m.0128txlv',), ('m.012r_3jl',), ('m.0z53q5w',)]
2024-04-24 11:01:34 INFO     mid_iters: ('m.016f2t',)
2024-04-24 11:01:39 INFO     top3_ques rela: ['location.country.languages_spoken', 'language.human_language.countries_spoken_in', 'location.country.official_language', 'language.human_language.main_country', 'people.ethnicity.languages_spoken', 'location.country.currency_used']
2024-04-24 11:01:39 INFO     mid_iters: ('m.01shgy',)
2024-04-24 11:01:43 INFO     top3_ques rela: []
2024-04-24 11:01:43 INFO     mid_iters: ('m.016f2t',)
2024-04-24 11:01:43 INFO     top3_ques rela: ['location.country.languages_spoken', 'language.human_language.countries_spoken_in', 'location.country.official_language', 'language.human_language.main_country', 'people.ethnicity.languages_spoken', 'location.country.currency_used']
2024-04-24 11:01:43 INFO     mid_iters: ('m.01shgy',)
2024-04-24 11:01:43 INFO     top3_ques rela: []
2024-04-24 11:01:43 INFO     mid_iters: ('m.0zjdj1p',)
2024-04-24 11:01:47 INFO     top3_ques rela: []
2024-04-24 11:01:47 INFO     mid_iters: ('m.0m1ls96',)
2024-04-24 11:01:52 INFO     top3_ques rela: []
2024-04-24 11:01:52 INFO     mid_iters: ('m.012qn9tf',)
2024-04-24 11:01:56 INFO     top3_ques rela: []
2024-04-24 11:01:56 INFO     mid_iters: ('m.0128txlv',)
2024-04-24 11:02:00 INFO     top3_ques rela: []
2024-04-24 11:02:00 INFO     mid_iters: ('m.012r_3jl',)
2024-04-24 11:02:05 INFO     top3_ques rela: []
2024-04-24 11:02:05 INFO     mid_iters: ('m.0z53q5w',)
2024-04-24 11:02:09 INFO     top3_ques rela: []
2024-04-24 11:02:09 INFO     predicted_answer: None
2024-04-24 11:02:09 INFO     label: ['m.02h40lc', 'm.02hwhyv', 'm.02jcw', 'm.03_9r', 'm.0jdg7']
2024-04-24 11:02:09 INFO     ================================================================
2024-04-24 11:02:09 INFO     consistent candidates number: 1
2024-04-24 11:02:09 INFO     em_score: 0.0
2024-04-24 11:02:09 INFO     correct: 0
2024-04-24 11:02:09 INFO     total: 1
2024-04-24 11:02:09 INFO     no_ans: 1
2024-04-24 11:02:09 INFO      
2024-04-24 11:02:09 INFO     ================================================================
divinerIU commented 6 months ago

this is my fn_org and sub_fn ,I can't understand their function. 屏幕截图 2024-04-24 133352 I found that em_score, correct, total and no_ans were all related to the problem.

ltl3A87 commented 6 months ago

Hi all, sorry for my late reply. Although derived from the same KB, webqsp has some nuanced difference with grailQA, could you please try replace the two function in the org script to this one:

def from_fn_to_id_set(fn_list, question, name_to_id_dict, bm25_all_fns, all_fns):
    logger.info("fn_list: {}".format(fn_list))
    return_mid_list = []
    for fn_org in fn_list:
        drop_dot = fn_org.split()
        drop_dot = [seg.strip('.') for seg in drop_dot]
        drop_dot = " ".join(drop_dot)
        if fn_org.lower() not in question and drop_dot.lower() in question:
            fn_org = drop_dot
        if fn_org.lower() not in name_to_id_dict:
            logger.info("fn_org: {}".format(fn_org.lower()))
            tokenized_query = fn_org.lower().split()
            fn = bm25_all_fns.get_top_n(tokenized_query, all_fns, n=1)[0]
            logger.info("sub fn: {}".format(fn))
        else:
            fn = fn_org
        if fn.lower() == "president":
            mids = ["m.060c4"]
        elif fn.lower() == "voice":
            mids = ["m.02nsjvf"]
        elif fn.lower() == "marriage":
            mids = ["m.04ztj"]
        else:
            id_dict = name_to_id_dict[fn.lower()]
            mids = sorted(id_dict.items(), key=lambda x: x[1], reverse=True)
            mids = [mid[0] for mid in mids]
        mid_set = []
        for mi in mids:
            if mi not in mid_set:
                mid_set.append(mi)
        return_mid_list.append(mid_set)
    return return_mid_list

def convz_fn_to_mids(gene_exp, found_names, found_mids):
    if len(found_names) == 0:
        return gene_exp
    start_index = 0
    new_string = ''
    for name, mid in zip(found_names, found_mids):
        if name == "marriage":
            b_idx = gene_exp.index("Marriage")
        else:
            b_idx = gene_exp.lower().index(name)
        e_idx = b_idx + len(name)
        while gene_exp[b_idx-1] != " " or gene_exp[e_idx] != ")":
            if name == "marriage":
                b_idx = gene_exp[e_idx:].index("Marriage") + e_idx
            else:
                b_idx = gene_exp[e_idx:].lower().index(name) + e_idx
            e_idx = b_idx + len(name)
        new_string = new_string + gene_exp[start_index:b_idx] + mid
        start_index = e_idx
    new_string = new_string + gene_exp[start_index:]
    return new_string