spcl / ncc

Neural Code Comprehension: A Learnable Representation of Code Semantics
BSD 3-Clause "New" or "Revised" License
206 stars 51 forks source link

Asm inline call handling #18

Closed Baumanar closed 4 years ago

Baumanar commented 5 years ago

I have a question about the way you handle assembly calls. In the pre-processing part where you preprocess .ll files, you discard asm call that return void by using the keep() function:

if re.search('call void asm', line):
        return False

However you dont handle asm inline calls that return something else during the pre-processing (maybe it is very specific in your case) and you seem to handle other cases specifically while parsing the preprocessed code:

            # function call
            elif re.match(r'(' + rgx.local_id + r' = )?(tail )?(call|invoke) ', line):

                # Get function name
                if ' asm ' in line:
                    if line == '%13 = tail call { %struct.rw_semaphore*, i64 } asm sideeffect "':
                        line = '%13 = tail call { %struct.rw_semaphore*, i64 } asm sideeffect "# beginning down_read\0A\09.pushsection .smp_locks,\22a\22\0A.balign 4\0A.long 671f - .\0A.popsection\0A671:\0A\09lock;  incq ($3)\0A\09  jns        1f\0A  call call_rwsem_down_read_failed\0A1:\0A\09# ending down_read\0A\09", "=*m,={ax},={rsp},{ax},*m,2,~{memory},~{cc},~{dirflag},~{fpsr},~{flags}"(%struct.atomic64_t* %11, %struct.rw_semaphore* %10, %struct.atomic64_t* %11, i64 %12) #4, !srcloc !9'
                    if line == '%16 = tail call i64 asm sideeffect "':
                        line = '%16 = tail call i64 asm sideeffect "# beginning __up_read\0A\09.pushsection .smp_locks,\22a\22\0A.balign 4\0A.long 671f - .\0A.popsection\0A671:\0A\09lock;   xadd      $1,($2)\0A\09  jns        1f\0A\09  call call_rwsem_wake\0A1:\0A# ending __up_read\0A", "=*m,={dx},{ax},1,*m,~{memory},~{cc},~{dirflag},~{fpsr},~{flags}"(%struct.atomic64_t* %11, %struct.rw_semaphore* %10, i64 -1, %struct.atomic64_t* %11) #4, !srcloc !11'
                    func_name_ = re.search(r' asm (?:sideeffect )?(\".*\")\(', line)

My question is what is the difference between those two cases ? does it really matter or could we ignore asm inline calls whatever the type returned ?

tbennun commented 4 years ago

We don't take into account "call void asm" calls. The second piece of code is likely an artifact of fixing an edge case in preprocessing of certain statements. Specifically these two contain an inline asm call. This will be removed with a future LLVM parser. Thanks for finding this code!