spcl / ncc

Neural Code Comprehension: A Learnable Representation of Code Semantics
BSD 3-Clause "New" or "Revised" License
206 stars 51 forks source link

Confusion in inst2vec_preprocess.py when reading code #28

Closed island255 closed 4 years ago

island255 commented 4 years ago

When I reading code in inst2vec_preprocess.py, I find in line 865 that _assert check is not None, "Could not match argument list in:\n" + line + "\nFunction:\n" + funcname may have to change to assert check is None. But I'm confused and don't know whether I should change it.


def get_num_args_func(line, func_name=None):
    """
    Get the number of arguments in a line containing a function
    :param line: LLVM IR line
    :param func_name: function name
    :return num_args: number of arguments
            arg_list: list of arguments
    """
    modif_line = re.sub(r'<[^<>]+>', '', line)  # commas in vectors/arrays should not be counted as argument-separators
    arg_list_ = find_outer_most_last_parenthesis(modif_line)  # get last parenthesis
    if arg_list_ is None:
        # Make sure that this is the case because the function has no arguments
        # and not because there was in error in regex matching
        check = re.match(rgx.func_call_pattern + r'\(\)', modif_line)
        **_assert check is not None, "Could not match argument list in:\n" + line + "\nFunction:\n" + func_name_**
        num_args = 0
        arg_list = ''
    elif arg_list_ == '()':
        # Make sure that this is the case because the function has no arguments
        # and not because there was in error in regex matching
        check = re.match(rgx.func_call_pattern + r'\(\)', modif_line)
        if check is None:
            check = re.search(r' asm (?:sideeffect )?(\".*\")\(\)', modif_line)
        if check is None:
            check = re.search(rgx.local_id + r'\(\)', modif_line)
        if check is None:
            okay = line[-2:] == '()'
            if not okay:
                check = None
            else:
                check = True
        assert check is not None, "Could not match argument list in:\n" + line + "\nFunction:\n" + func_name
        num_args = 0
        arg_list = ''
    else:
        arg_list = arg_list_[1:-1]
        arg_list = re.sub(r'<[^<>]+>', '', arg_list)
        arg_list_modif = re.sub(r'\([^\(\)]+\)', '', arg_list)
        arg_list_modif = re.sub(r'\([^\(\)]+\)', '', arg_list_modif)
        arg_list_modif = re.sub(r'\([^\(\)]+\)', '', arg_list_modif)
        arg_list_modif = re.sub(r'\([^\(\)]+\)', '', arg_list_modif)
        arg_list_modif = re.sub(r'\"[^\"]*\"', '', arg_list_modif)
        arg_list_modif = re.sub(r'{.*}', '', arg_list_modif)
        num_args = len(re.findall(',', arg_list_modif)) + 1

    return num_args, arg_list
tbennun commented 4 years ago

The assertion is correct. It checks for different regular expressions, and if there is no match continues to the next one. If the last check fails, then check is None and the assertion should fail. In any other case, the check is successful and execution should continue.

Why do you think it should be the opposite?

island255 commented 4 years ago

I think that if it goes to the branch if arg_list_ is None: , that check must be None (as it found no "( )"). And assert will meet False and assertion will fail.

But in this case, assertion in this brach will always fail. Does it mean that this brach should just not be executed?

island255 commented 4 years ago

Or it is the debug code left after testing the function _find_outer_most_lastparenthesis. If it is that case, I understand it and the comments there. When I first see it, I just got confused and didn't realize the meaning of this code.

tbennun commented 4 years ago

Yes, the assertion and regexp in the branch is a sanity check meant to make sure that find_outer_most_last_parenthesis didn't miss an empty argument list somewhere.

island255 commented 4 years ago

Thanks for your reply! Thanks for your outstanding work! I really learn a lot from it.

tbennun commented 4 years ago

Happy to help :)