ymirsky / VulChecker

A deep learning model for localizing bugs in C/C++ source code (USENIX'23)
GNU General Public License v3.0
115 stars 12 forks source link

what is the argument "TARGET" in "hector lint [OPTIONS] [SOURCE_DIR] TARGET MODEL_DIR" #2

Closed hellogirl007 closed 11 months ago

hellogirl007 commented 11 months ago

Hi, ymirsky. When I try to run this great work on a new project, I'm confused about the argument "TARGET". I have tried the following

  1. "", return "hector_ml/configure/ninja.py", line 89, in get_sources_from_dependency_graph raise ValueError(f"Don't know about target {target!r}.") ValueError: Don't know about target 'ninja'."
  2. "all", return "llvm-link -S -o all-combine_ll...src/codegen/target/arm/operations64.ll FAILED: all-combine_ll.ll llvm-link -S -o all-combine_ll.ll src/finder.ll unittest/test-harness.ll unittest/codegen/assembler-test.ll unittest/codegen/registers-test.ll unittest/util/arg-parser-test.ll src/codegen/compiler.ll src/codegen/runtime.ll src/codegen/targets.ll src/codegen/compiler/context.ll src/codegen/compiler/event.ll src/codegen/compiler/frame.ll src/codegen/compiler/ir.ll src/codegen/compiler/promise.ll src/codegen/compiler/read.ll src/codegen/compiler/regalloc.ll src/codegen/compiler/resource.ll src/codegen/compiler/site.ll src/codegen/compiler/value.ll src/system/posix.ll src/system/posix/crash.ll src/heap/heap.ll src/util/arg-parser.ll src/util/fixed-allocator.ll src/codegen/target/x86/assembler.ll src/codegen/target/x86/block.ll src/codegen/target/x86/context.ll src/codegen/target/x86/detect.ll src/codegen/target/x86/encode.ll src/codegen/target/x86/fixup.ll src/codegen/target/x86/multimethod.ll src/codegen/target/x86/operations.ll src/codegen/target/x86/padding.ll src/tools/binary-to-object/main.ll src/tools/object-writer/elf.ll src/tools/object-writer/mach-o.ll src/tools/object-writer/pe.ll src/tools/object-writer/tools.ll src/tools/type-generator/main.ll src/codegen/target/arm/assembler.ll src/codegen/target/arm/block.ll src/codegen/target/arm/context.ll src/codegen/target/arm/fixup.ll src/codegen/target/arm/multimethod.ll src/codegen/target/arm/operations32.ll src/codegen/target/arm/operations64.ll error: Linking globals named '__cxa_pure_virtual': symbol multiply defined! ninja: build stopped: subcommand failed."
  3. target from "hector configure" command, also do not work

what should I do when running a new project by hector?

ymirsky commented 11 months ago

What is the full command you are trying to run? I suggest also looking at the demo scripts as a reference.

hellogirl007 commented 11 months ago

hector lint --llap-lib-dir ~/llvm-project/llvm-build/lib --output test.csv --device cpu ./bdwgc-gc7_6_0/ ./bdwgc-gc7_6_0/tests/gctest ../models/model/ ninja: no work to do. ninja: no work to do. ninja: fatal: unknown tool 'rules', did you mean 'urtle'? Traceback (most recent call last): File "/home/ubuntu/anaconda3/bin/hector", line 8, in sys.exit(main()) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(args, *kwargs) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), args, **kwargs) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/hector_ml/lint.py", line 168, in main hector_config.make() File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/hector_ml/configure/init.py", line 293, in make self.configure_if_needed() File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/hector_ml/configure/init.py", line 284, in configure_if_needed self.build_ninja_file() File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/hector_ml/configure/init.py", line 359, in build_ninja_file for source_file in source_files: File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/hector_ml/configure/ninja.py", line 138, in get_sources extra_flags = get_extra_flags(build_dir) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/hector_ml/configure/ninja.py", line 109, in get_extra_flags rules = ninja("-t", "rules", build_dir=build_dir).splitlines() File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/hector_ml/configure/ninja.py", line 33, in ninja encoding="utf-8", File "/home/ubuntu/anaconda3/lib/python3.7/subprocess.py", line 487, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-t', 'rules']' returned non-zero exit status 1.

here are the whole command and error message

ymirsky commented 11 months ago

The target is the path to the main source file which you are compiling. If you run hector configure with target "" then it should list for you the avalaible targets in the current directory. You can choose one of those and use it for your target argument in lint.

hellogirl007 commented 11 months ago

Thanks for your reply. The error message I'm encountering now is “Command '['ninja', '-t', 'rules']' returned non-zero exit status 1.”

the "rules" is not in the ninja -t list ninja subtools: browse browse dependency graph in a web browser clean clean built files commands list all commands required to rebuild given targets deps show dependencies stored in the deps log graph output graphviz dot file for targets query show inputs/outputs for a path targets list targets by their rule or depth in the DAG compdb dump JSON compilation database to stdout recompact recompacts ninja-internal data structures

in the ninja.py line 109 rules = ninja("-t", "rules", build_dir=build_dir).splitlines()

line 29. subprocess.run ["ninja", args], the args are "-t", "rules" which lead to the error.

how can I solve this error, looking forward for your reply, many thanks

ymirsky commented 11 months ago

Did you find your target using configure? Are you able to run the demo script?

hellogirl007 commented 11 months ago

Did you find your target using configure? Are you able to run the demo script?

the target is found by hector configure. When I test the project "bdwgc-gc7_6_0" provided in the dataset, the target is "./tests/gctest", then I run the full command "hector lint --llap-lib-dir ~/llvm-project/llvm-build/lib --output test.csv --device cpu ./bdwgc-gc7_6_0/ ./bdwgc-gc7_6_0/tests/gctest ../models/model/", an error reports like above

I also test the command in "demos/preprocess_demo.sh", the same error occurs the command is "hector configure --llap-lib-dir ~/llvm-project/llvm-build/lib --labels labels.json cmake src/tools/type-generator/type_generator 121 190 415 416"

michaelbrownuc commented 11 months ago

error: Linking globals named '__cxa_pure_virtual': symbol multiply defined!

This issue with the multiply defined symbols is a compiler issue that arises sometimes when trying to combine complex C++ codebases into a single LLVM IR file. Symbols (variables, function names etc.) are used in multiple places throughout the codebase, which is fine for normal builds. However, the process VulChecker uses to create a monolithic LLVM file for analysis may see symbols collide because each source file can't be sure which is responsible for defining a symbol, so they all do. This problem has affects other projects I've worked on doing analysis in LLVM.

Unfortunately, there is no easy fix - it requires manual code editing of LLVM IR to resolve the symbols. In this case the function is placed by the compiler to handle indirection in C++, so it may not be possible to fix this manually in the code. You may have to build the project first using wllvm or gllvm, recover a single LLVM bitcode file, and manually feed this bitcode file through llap and hector. It is possible this will fail too.

Ultimately, Vulchecker uses LLVM and has some of the same limitations that LLVM has when handling C++ codebases.

michaelbrownuc commented 11 months ago

I also recall during our research that when we encountered this we only scanned one LLVM IR file at a time. You could try that as well.

hellogirl007 commented 11 months ago

I also recall during our research that when we encountered this we only scanned one LLVM IR file at a time. You could try that as well.

Thanks for your reply. The error "error: Linking globals named '__cxa_pure_virtual': symbol multiply defined!" does not occur when I used one target other than "all". But a new error occurs "CalledProcessError: Command '['ninja', '-t', 'rules']' returned non-zero exit status 1."

I review the parameters in ninja and "rules" is not a one. This error also occurs when running demo script.