lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.51k stars 164 forks source link

Python file read twice ? #2475

Open rebcabin opened 9 months ago

rebcabin commented 9 months ago

repro:

lldb lpython -- --show-asr tests/expr8.py

first time it's read is line 128 of lpython.cpp

    frame #0: 0x000000010000e794 lpython`(anonymous namespace)::emit_asr(infile="tests/expr8.py", pass_manager=0x000000016fdfe7d8, runtime_library_dir="/Users/brian/Dropbox/Mac/Documents/GitHub/lpython/src/bin/../runtime", with_intrinsic_modules=false, compiler_options=0x000000016fdfe870) at lpython.cpp:200:5
   197          std::string input = LCompilers::read_file(infile);
   198          lm.init_simple(input);
   199          lm.file_ends.push_back(input.size());
-> 200      }
   201      LCompilers::Result<LCompilers::LPython::AST::ast_t*> r1 = parse_python_file(
   202          al, runtime_library_dir, infile, diagnostics, 0, compiler_options.new_parser);
   203      std::cerr << diagnostics.render(lm, compiler_options);

second time it's read is line 125 of parser.cpp

    frame #0: 0x00000001000c9f78 lpython`LCompilers::LPython::parse_python_file(al=0x000000016fdfd2b0, (null)="/Users/brian/Dropbox/Mac/Documents/GitHub/lpython/src/bin/../runtime", infile="tests/expr8.py", diagnostics=0x000000016fdfd298, prev_loc=0, new_parser=true) at parser.cpp:126:49
   123      new_parser = true;
   124      LCOMPILERS_ASSERT(new_parser)
   125      std::string input = read_file(infile);
-> 126      Result<LPython::AST::Module_t*> res = parse(al, input, prev_loc, diagnostics);
   127      if (res.ok) {
   128          ast = (LPython::AST::ast_t*)res.result;
   129      } else {

intentional?

rebcabin commented 9 months ago

I also see a read_file on line 197 of lpython.cpp. I'm still learning what's going on.

certik commented 9 months ago

The file should only be read once. If it is read twice, that's a bug that we need to fix. There are many file reads, but only one should be executed.

rebcabin commented 9 months ago

Looks like, during parsing, read_file is called at least twice, once to parse and once to [gs]et_module_path, and maybe more than once for that purpose. Some of the following breaks are redundant, but not all of them are.

I set breakpoints on every one of these lines:

└─(11:13:31 on main ✹ ✭)──> find . "(" -iname "*.cpp" -o -name "*.h" ")" -print0 | xargs -0 -P 0 grep -n read_file                       ──(Sun,Feb04)─┘
./src/lpython/semantics/python_ast_to_asr.cpp:74:    bool status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:81:            status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:90:            status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:129:        std::string input = read_file(infile);
./src/lpython/parser/parser.cpp:125:    std::string input = read_file(infile);
./src/bin/lpython.cpp:90:    std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:103:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:140:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:161:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:172:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:197:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:250:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:295:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:346:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:399:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:443:    std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:491:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:592:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:707:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:762:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:872:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:945:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:1019:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:1296://     std::string input = read_file(infile);
./src/libasr/utils.h:105:bool read_file(const std::string &filename, std::string &text);
./src/libasr/string_utils.h:29:std::string read_file(const std::string &filename);
./src/libasr/utils2.cpp:31:bool read_file(const std::string &filename, std::string &text)
./src/libasr/diagnostics.cpp:126:    read_file(s.filename, input);
./src/libasr/codegen/asr_to_llvm.cpp:368:        std::string input = read_file(infile);
./src/libasr/string_utils.cpp:106:std::string read_file(const std::string &filename)
./src/libasr/asr_utils.cpp:455:        if (read_file(full_path.string(), modfile)) {

then I ran lpython --show-asr ./tests/expr8.py. The debugger stopped several times on read_file.

image

image

image

image

image

image

image

image

image

image

image

certik commented 9 months ago

If so, we need to fix it.

rebcabin commented 9 months ago

marking enhancement and good first issue. It likely means some architectural "re-plumbing" of the front-end. Some file metadata or even the whole file itself will have to be cached somewhere.

syheliel commented 8 months ago

I write a gdb script to have a dig into the problem. It will record the call stack when read_file is called and print them into log file (gdb.txt by default). You can use gdb -x ./read_file_count.gdb to use this script:

# read_file_count.gdb
set logging enabled on
set pagination off

file ./src/bin/lpython
set args --show-asr ./tests/expr8.py
set $count=0
break LCompilers::read_file
commands
  silent
  set $count = $count + 1
  printf "Called function: %s\n", "your_function_name"
  printf "Here is the function input args: "
  info args
  printf "function call location: "
  where 1
  list
  cont
  end
r
set logging enabled off

In logging file, The read_file is indeed called two times:

Called function: your_function_name
Here is the function input args: filename = "./tests/expr8.py"
function call location: #0  LCompilers::read_file (filename="./tests/expr8.py") at /home/zzh/lpython/src/libasr/string_utils.cpp:109
104     }
105
106     std::string read_file(const std::string &filename)
107     {
108         std::ifstream ifs(filename.c_str(), std::ios::in | std::ios::binary
109                 | std::ios::ate);
110
111         std::ifstream::pos_type filesize = ifs.tellg();
112         if (filesize < 0) return std::string();
113
Called function: your_function_name
Here is the function input args: filename = "./tests/expr8.py"
function call location: #0  LCompilers::read_file (filename="./tests/expr8.py") at /home/zzh/lpython/src/libasr/string_utils.cpp:109
104     }
105
106     std::string read_file(const std::string &filename)
107     {
108         std::ifstream ifs(filename.c_str(), std::ios::in | std::ios::binary
109                 | std::ios::ate);
110
111         std::ifstream::pos_type filesize = ifs.tellg();
112         if (filesize < 0) return std::string();
113

Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?

certik commented 8 months ago

Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?

Let's refactor the frontend to only call the read function once. I wonder what is going on.

syheliel commented 8 months ago

Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?

Let's refactor the frontend to only call the read function once. I wonder what is going on.

Ok, let me explore the frontend's execution flow and try to figure out the root cause