lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.5k stars 157 forks source link

Good Design for Frontend that shares most of the code with other LCompilers #1250

Open Shaikh-Ubaid opened 1 year ago

Shaikh-Ubaid commented 1 year ago

In lpython, the code in EMSCRIPTEN KEEPALIVE functions in src/libasr/codegen/asr_to_wasm.cpp files is long (and does not look very maintainable). The code for the same functions in lfortran is less long as it uses/utilises FortranEvaluator.

From https://lfortran.zulipchat.com/#narrow/stream/311866-LPython/topic/WASM.20Backend/near/296663142,

We need something like the FortranEvaluator. Ideally we should figure out how to share a lot of these somehow, such as get_cpp and get_llvm and get_wasm is shared. We need a (possibly/hopefully good) idea about how to improve the LFortran design and share most of the code with LPython.

Shaikh-Ubaid commented 1 year ago
    Allocator al(4*1024);
    LFortran::diag::Diagnostics diagnostics;
    LFortran::LocationManager lm;
    lm.in_filename = infile;
    std::vector<std::pair<std::string, double>>times;
    auto file_reading_start = std::chrono::high_resolution_clock::now();
    std::string input = LFortran::read_file(infile);
    auto file_reading_end = std::chrono::high_resolution_clock::now();
    times.push_back(std::make_pair("File reading", std::chrono::duration<double, std::milli>(file_reading_end - file_reading_start).count()));
    lm.init_simple(input);
    auto parsing_start = std::chrono::high_resolution_clock::now();
    LFortran::Result<LFortran::LPython::AST::ast_t*> r = parse_python_file(
        al, runtime_library_dir, infile, diagnostics, compiler_options.new_parser);
    auto parsing_end = std::chrono::high_resolution_clock::now();
    times.push_back(std::make_pair("Parsing", std::chrono::duration<double, std::milli>(parsing_end - parsing_start).count()));
    std::cerr << diagnostics.render(input, lm, compiler_options);
    if (!r.ok) {
        print_time_report(times, time_report);
        return 1;
    }

    // Src -> AST -> ASR
    LFortran::LPython::AST::ast_t* ast = r.result;
    diagnostics.diagnostics.clear();
    auto ast_to_asr_start = std::chrono::high_resolution_clock::now();
    LFortran::Result<LFortran::ASR::TranslationUnit_t*>
        r1 = LFortran::LPython::python_ast_to_asr(al, *ast, diagnostics, true,
            compiler_options.disable_main, compiler_options.symtab_only, infile, compiler_options.import_path);
    auto ast_to_asr_end = std::chrono::high_resolution_clock::now();
    times.push_back(std::make_pair("AST to ASR", std::chrono::duration<double, std::milli>(ast_to_asr_end - ast_to_asr_start).count()));
    std::cerr << diagnostics.render(input, lm, compiler_options);
    if (!r1.ok) {
        LFORTRAN_ASSERT(diagnostics.has_error())
        print_time_report(times, time_report);
        return 2;
    }
    LFortran::ASR::TranslationUnit_t* asr = r1.result;
    if( compiler_options.disable_main ) {
        int err = LFortran::LPython::save_pyc_files(*asr, infile);
        if( err ) {
            return err;
        }
    }
    diagnostics.diagnostics.clear();

    // ASR -> X86
    auto asr_to_x86_start = std::chrono::high_resolution_clock::now();
    LFortran::Result<int> r3 = LFortran::asr_to_x86(*asr, al, outfile, time_report);
    auto asr_to_x86_end = std::chrono::high_resolution_clock::now();
    times.push_back(std::make_pair("ASR to X86", std::chrono::duration<double, std::milli>(asr_to_x86_end - asr_to_x86_start).count()));
    std::cerr << diagnostics.render(input, lm, compiler_options);
    print_time_report(times, time_report);
    if (!r3.ok) {
        LFORTRAN_ASSERT(diagnostics.has_error())
        return 3;
    }

There is a significant amount of code like the above being repeated in functions like compile_python_to_object_file(), compile_to_binary_wasm(), compile_to_binary_x86() (and also the EMSCRIPTEN_KEEPALIVE functions). We need-to/should figure out a good solution to eliminate the repetitive code.

certik commented 1 year ago

Thanks for opening an issue for this. Indeed, there is tons of repetitive code like this, which has to deal with executing one or more (but not necessarily all) of the steps in the pipeline preprocessor / prescanner / tokenizer / parser / ast -> asr / asr passes / llvm / machine code.

Thirumalai-Shaktivel commented 1 year ago

I think we can merge all the functions (emit_ast, emit_asr, emit_cpp, emit_c, emit_wat, emit_llvm, compile_python_to_object_file(), compile_to_binary_wasm() and compile_to_binary_x86()) into one and pass an argument that has the above function name. Say, we want emit_llvm, so we can pass, emit_llvm as an arguement to the merged function. Now, we generates till llvm, print and return. This might work.