julelang / jule

Effective programming language to build efficient, fast, reliable and safe software while maintaining simplicity
https://jule.dev
BSD 3-Clause "New" or "Revised" License
133 stars 13 forks source link

JuleC's GCC compatibility issues #34

Closed mertcandav closed 4 months ago

mertcandav commented 1 year ago

Description

We tried to create a CI for Windows. It was a simple build CI. It would build JuleC from the IR code and rebuild and compile the latest JuleC from source. But we had some problems.

This CI gets the latest windows-amd64 with CURL from Julec-IR repository and compiles via GCC with O3, -w, -Wa,-mbig-obj and C++17. After compiling the IR code obtained with CURL, a simple command is executed for testing. A simple julec version command is then executed. However, the execution of the program results in failure.

We left only the header files and an empty entry point, as we thought this could be caused by some algorithms in the IR code. Then we removed all header files. It was the inclusion of the API that was causing the problem. However, we had a program that did nothing. There was an empty entry point. From what we analyzed, the API did not have code that would lead to the execution of an algorithm that could cause problems when just included. So the program wasn't really doing anything as far as we know.

This issue could be a minor overlooked bug, a simple programming error in the API, or something that has nothing to do with us directly, such as a GCC compilation issue. We tried to do various things to understand this problem, but we couldn't find a rational point to start fixing it. Needs more research.

Expected behavior

Program should execute as expected.

Current behavior

Each execution's result is Process completed with exit code -1073741511 or something like that.

Additional information

The current situation when we do this:

mertcandav commented 1 year ago

Here is the more information about that.

We tried again to create a workflow for Windows. Meanwhile, Jule's current commit was 512978d224, and IR's current commit was 9ab7a750fc. The attempt failed again. Again a fail exit code. This time the exit code was 1.

Using Clang is not possible due to MSVC. Using windows-2019 runner image did not fix the problem and caused compilation errors. windows-latest, which is currently windows-2022, can compile JuleC IR. But even a simple command julec version exits with code 1. So JuleC Version step is failed.

We used Wa,-mbig-obj to avoid File too big problem with GCC.

Our workflow build_windows.yml file:

name: Build (Windows)
on: [push, pull_request]

jobs:
  build:
    runs-on: windows-latest

    steps:
      - uses: actions/checkout@v3

      - name: Get latest IR
        run: |
          curl -o .\ir.cpp https://raw.githubusercontent.com/julelang/julec-ir/main/src/windows-amd64.cpp

      - name: Compile Latest JuleC IR
        shell: cmd
        run: |
          mkdir .\bin
          g++ -Wa,-mbig-obj -O0 --std=c++17 -w -o .\bin\julec.exe .\ir.cpp
          git update-index --add --chmod=-x .\bin\julec.exe

      - name: JuleC Version
        run: |
          .\bin\julec.exe version

      - name: Build JuleC
        shell: cmd
        run: |
          .\bin\julec.exe -t .\src\julec
          g++ -Wa,-mbig-obj -O0 --std=c++17 -w -o .\bin\julec.exe .\dist\ir.cpp
          git update-index --add --chmod=-x .\bin\julec.exe

We tried this with same JuleC codes and IR on our amd64 machine running Windows 10 and the compiler compiled from IR worked as expected. We still don't have clear data on why GitHub Actions build is problematic.

Additionally, on our own machine, GCC does not have a File too big problem, so we were able to compile without using Wa,-mbig-obj.

Panquesito7 commented 1 year ago

Maybe the version command doesn't work because JuleC has not been built yet? 🤔

mertcandav commented 1 year ago

We're pretty sure it was built. In any compile problem, the workflow does not proceed to the next step. But still, just to be sure, I executed a dir command and verified julec.exe is where it should be.

mertcandav commented 1 year ago

Research Update

We have more research and some findings on the subject. We're almost certain that the GCC in our build on a native Windows machine is actually an alias executable file for Clang, hence the wrong results. To fix this we got a MinGW GCC and tried again. The file too big problem also occurred on our machine, this seems normal and fixed with -Wa,-mbig-obj flag.

The interesting part is that the compiler compiled from the IR code still works correctly. Does not terminate immediately with exit code 1 when executed like GitHub Action machines. The julec version command can be executed successfully. But when we tried to transcribe the compiler's own source code, we clearly got an exit code of 11 ie a segmentation fault occurred. We don't have enough evidence to fully understand the issues with GCC. With MinGW LLVM Clang, we were able to achieve a seamless Jule development experience on Windows. The compiler didn't have any problems. We don't know exactly why GCC is having trouble, maybe even a compiler bug.

So it looks like we have to update GCC support as partial support state. There is detailed information about this on the relevant manual page.

GitHub Actions

GitHub Actions is still a puzzle. We compiled it from IR code with MinGW LLVM Clang which allowed us to have a smooth experience on our local machine. We expected a smooth experience, but we may have found a strong finding that the problem is not related to the compilers. Even the executable obtained with the Clang compiler, which we have had smooth experience with, has the same problem as with GCC. The program terminated immediately with exit code 1 when executed.

This issue we're having on GitHub Actions machines may not have anything to do with us.

Panquesito7 commented 1 year ago

We can also use gdb or OnlineGDB to have more debugging options and information to see what caused the segmentation fault issue. Let me know if you need any further help. Thanks. 🙂

mertcandav commented 1 year ago

We can also use gdb or OnlineGDB to have more debugging options and information to see what caused the segmentation fault issue. Let me know if you need any further help. Thanks. 🙂

GDB did not provide any meaningful help. I suspect this has more to do with the compiler than with the Jule API. No problems when compiled with Clang. When I test and debug it on my local machine, I see that this is due to the use of a delete keyword when copying the jule::Any type. But having the release memory there shouldn't be a problem. The memory address trying to be freed looks weird. Exactly: 0xabababababababab, if this address has a special meaning, please let me know.

When I checked the stack trace I couldn't see anything that could cause this. Additionally I must say that the my Windows machine is weak and the compile times are really long and it is difficult to debug. Therefore, the process cannot progress quickly on my machine.

If you do any analysis, debugging, research and similar thing on this issue, please share your findings with us.

Thanks.

mertcandav commented 1 year ago

Here is the new news.

This isn't just a Windows problem. Therefore, I will update the title of this issue accordingly. This problem seems to occur on Linux and macOS as well. As far as tested on macOS with the latest updates, GCC compilation was successful and the produced executable worked as expected. This is how I observed that GCC compatibility has increased.

Seems like help is needed to understand if compatibility is fully achieved. I tried to compile on GitHub Actions. But julec version command exits with code 1. Looks like this needs testing locally on a Windows machine. We don't have enough information yet, but significant progress towards GCC compatibility looks good.

mertcandav commented 1 year ago

I tested GCC support on VM Fedora Linux 38 Workstation Edition. Everything seems ok. Clang and GCC works as expected, no any problem.

Clang version: clang version 16.0.6 (Fedora 16.0.6-3.fc38) GCC version: g++ (GCC) 13.0.1 20230401 (Red Hat 13.0.1-0)

I used latest IR (version 54a6661525) and master source tree (hash 54a6661525) of Jule.

Panquesito7 commented 1 year ago

I tested GCC support on VM Fedora Linux 38 Workstation Edition. Everything seems ok. Clang and GCC works as expected, no any problem.

Clang version: clang version 16.0.6 (Fedora 16.0.6-3.fc38) GCC version: g++ (GCC) 13.0.1 20230401 (Red Hat 13.0.1-0)

I used latest IR (version 54a6661) and master source tree (hash 54a6661) of Jule.

That's great news! 🎉 Should we try again with Windows?

mertcandav commented 1 year ago

I tested GCC support on VM Fedora Linux 38 Workstation Edition. Everything seems ok. Clang and GCC works as expected, no any problem. Clang version: clang version 16.0.6 (Fedora 16.0.6-3.fc38) GCC version: g++ (GCC) 13.0.1 20230401 (Red Hat 13.0.1-0) I used latest IR (version 54a6661) and master source tree (hash 54a6661) of Jule.

That's great news! 🎉 Should we try again with Windows?

I created a Windows build CI on my own Jule fork to see if the issues were resolved. Very strange, but the problem seems to occur when std::stringstream is used. When I delete relevant statement, the program compiles and execution is successful, otherwise exit code 1 continues. I don't know if std::stringstream is directly part of the problem, but it's obviously something that's causing the problem. Looks like this needs a look.

Panquesito7 commented 9 months ago

I tested GCC support on VM Fedora Linux 38 Workstation Edition. Everything seems ok. Clang and GCC works as expected, no any problem. Clang version: clang version 16.0.6 (Fedora 16.0.6-3.fc38) GCC version: g++ (GCC) 13.0.1 20230401 (Red Hat 13.0.1-0) I used latest IR (version 54a6661) and master source tree (hash 54a6661) of Jule.

That's great news! 🎉 Should we try again with Windows?

I created a Windows build CI on my own Jule fork to see if the issues were resolved. Very strange, but the problem seems to occur when std::stringstream is used. When I delete relevant statement, the program compiles and execution is successful, otherwise exit code 1 continues. I don't know if std::stringstream is directly part of the problem, but it's obviously something that's causing the problem. Looks like this needs a look.

That's very strange. Are there any alternatives to std::stringstream that should be used to prevent this problem? We should still take a look, though, there's most likely another problem. I'll take a look and see if there's anything obvious.

mertcandav commented 9 months ago

I'm not sure about that, probably the issue is not the std::stringstream. But Clang's executables are works fine, GCC builds are not, which is increases complexity of the problem. I'm just wondering whether this issue is a bug of GCC.

Please share with us if you found something about this issue after your investigation.

mertcandav commented 4 months ago

Latest situation;

The windows-ci branch have a Windows [GCC] CIs with GitHub Actions and works well except one known issue. I don't know how it works now, the problem is still unknown. The developer ci has compilation steps for Windows with GCC, LLVM Clang is not preferred because has additional compilation errors.

I tested with another local machine using GCC on Windows. Unlike GitHub Actions, no any change on local machine. Still same problem exist, no progress. I don't have any idea what makes different GitHub Actions but it seems to work.

As I said, GitHub Actions have one known problem; console write call is not works. Jule uses WriteConsoleW function of Windows API which is provided by windows.h header. As far as I tested, this function works well on all tested systems. As far as I know, GitHub Actions uses UTF-8 codepage by default (Windows uses UTF-16 by default) and I confirm this with testing Unicode characters via simple printf call. I changed codepage to UTF-8 on local machine and then tested it but the WriteConsoleW function still works well. So I don't have any clear idea about the GitHub Actions problem.

I just investigate the original problem a bit more, no progress. I just write simple program like this:

#include <stdio.h>
#include "api/utf8.hpp"

int main() {
    printf("hello world\n");
    return 0;
}

The example program above will not prints while including api/utf8.hpp header. I modified relevant function declarations and definitions on this file. Finally, I stuck in the utf8_push_rune_bytes function's body. When I call dest.push_back method, the problem occurs, even called like dest.push_back(0). And this functions are not called anywhere, really. This seems absurd to me. I really suspect there is a bug in GCC causing this problem. If the problem related with Jule. Really wondering what is it.

mertcandav commented 4 months ago

Update for latest situation:

The example program above will not prints while including api/utf8.hpp header. I modified relevant function declarations and definitions on this file. Finally, I stuck in the utf8_push_rune_bytes function's body. When I call dest.push_back method, the problem occurs, even called like dest.push_back(0). And this functions are not called anywhere, really. This seems absurd to me. I really suspect there is a bug in GCC causing this problem. If the problem related with Jule. Really wondering what is it.

Alright, I discovered the problem. Probably my GCC installation is corrupted, same tasks are good after clean installation. Compiles successfully any program now.

I tested with another local machine using GCC on Windows. Unlike GitHub Actions, no any change on local machine. Still same problem exist, no progress. I don't have any idea what makes different GitHub Actions but it seems to work.

After fixing the compiler problem, I tested again and here is the good news: GCC support looks good! Everything works as expected, even using bootstrapped compiler which is compiled with GCC. No observations of GCC and Clang behavioral differences.

But we have a new problem. On Windows, GitHub Actions is failing when calling WriteConsoleW function of Windows API. Function returns false which is means failed. But this problem is not relevant with this issue. Therefore, I will open a new issue and I share here, then close this issue.

mertcandav commented 4 months ago

But we have a new problem. On Windows, GitHub Actions is failing when calling WriteConsoleW function of Windows API. Function returns false which is means failed. But this problem is not relevant with this issue. Therefore, I will open a new issue and I share here, then close this issue.

The relevant issue is: #107