jacobwilliams / json-fortran

A Modern Fortran JSON API
https://jacobwilliams.github.io/json-fortran/
Other
333 stars 82 forks source link

Tests fail with nvfortran compiler #531

Open timofeymukha opened 1 year ago

timofeymukha commented 1 year ago

Hello! We are trying to use json-fortran with the nvfortran compiler. Unfortunately, the test suite reports failure of several tests

The following tests FAILED:
      3 - jf_test_01 (SEGFAULT)
      5 - jf_test_03 (SEGFAULT)
      9 - jf_test_07 (SEGFAULT)
     10 - jf_test_08 (SEGFAULT)
     12 - jf_test_10 (Failed)
     16 - jf_test_14 (SEGFAULT)
     18 - jf_test_16 (SEGFAULT)
     25 - jf_test_23 (SEGFAULT)
     31 - jf_test_29 (SEGFAULT)
     32 - jf_test_30 (Failed)
     47 - jf_test_45 (SEGFAULT)
     54 - regression-test2.json (Failed)

Perhaps the pattern in which tests fail give a hint of where to look for the issue? The full failing CI run is available here. Any help is highly appreciated!

jacobwilliams commented 1 year ago

Interesting. I've never used the nvfortran compiler. I can try to take a look when I get a chance.

timofeymukha commented 1 year ago

Thanks, it would be really appreciated. But chances are that it is the compiler's fault.

EmilyBourne commented 6 months ago

I would also like to use the NVHPC compiler.

I am testing with version 23.3. It raises a warning during compilation:

NVFORTRAN-W-0194-INTENT(IN) argument cannot be defined - p (/home/json-fortran/src/json_value_module.F90: 1935)
  0 inform,   1 warnings,   0 severes, 0 fatal for json_value_rename

This is coming from this code: https://github.com/jacobwilliams/json-fortran/blob/18ef4c60493b7b9fbf41b48bb8e3970e59b57b53/src/json_value_module.F90#L1935-L1949

The compiler seems to be correct. The argument p is marked as intent(in) so it should not be modified. I think that the correct solution is to change the intent to inout. This then has repercussions on multiple functions.

However fixing this issue doesn't seem to fix the tests (although I only managed to run 7 of the 56 tests due to the cmake issue mentioned in #539). The (almost identical) results are:

      3 - jf_test_01 (SEGFAULT)
      5 - jf_test_03 (SEGFAULT)
      9 - jf_test_07 (SEGFAULT)
     10 - jf_test_08 (SEGFAULT)
     12 - jf_test_10 (Failed)
     16 - jf_test_14 (SEGFAULT)
     18 - jf_test_16 (SEGFAULT)
     25 - jf_test_23 (Failed)
     31 - jf_test_29 (SEGFAULT)
     32 - jf_test_30 (Failed)
     47 - jf_test_45 (SEGFAULT)
     54 - regression-test2.json (Failed)
jacobwilliams commented 6 months ago

Interesting. However, intent(in) for pointer arguments doesn't mean their attributes can't be changed, it only means we can't change what the pointer is pointing too. So I think this code is legal.

It could be a compile bug?

EmilyBourne commented 6 months ago

Interesting. However, intent(in) for pointer arguments doesn't mean their attributes can't be changed, it only means we can't change what the pointer is pointing too. So I think this code is legal.

It could be a compile bug?

It does indeed seem to be a compile bug. It has already been reported and fixed here: https://forums.developer.nvidia.com/t/incorrect-warning-for-intent-in-pointer-to-type-in-nvfortran-22-1/201471

As far as I can tell this particular problem has minimal impact as it is only a warning anyway.

Given this has been fixed I tried a more recent compiler (23.11 second-to-most-recent). The results are:

      3 - jf_test_01 (SEGFAULT)
      5 - jf_test_03 (SEGFAULT)
     10 - jf_test_08 (SEGFAULT)
     12 - jf_test_10 (Failed)
     16 - jf_test_14 (SEGFAULT)
     18 - jf_test_16 (SEGFAULT)
     25 - jf_test_23 (Failed)
     31 - jf_test_29 (SEGFAULT)
     32 - jf_test_30 (Failed)
     47 - jf_test_45 (SEGFAULT)

So good news! jf_test_07 no longer segfaults! But that still leaves several problematic tests.

I looked into the first segfault. The behaviour is quite strange. I am struggling to put together a reproducer but it seems that when json_value_module::json_traverse::traverse calls json_value_module::json_count it fails to pass through the pointer correctly. In json_value_module::json_count p is a nullptr. The easiest/cleanest fix that I have found for this is to make json_value_module::json_traverse recursive directly and avoid the function inside the function. Do you see any reason not to do this? If not then I can fork and open a PR.

EmilyBourne commented 6 months ago

Interestingly jf_test_03 also fails in a subroutine due to problems passing arguments (in json_value_module::json_check_all_for_duplicate_keys::duplicate_key_func)

jacobwilliams commented 6 months ago

I'm not sure why I did json_traverse like that. I wonder if there was some reason... Maybe we can try making it recursive and eliminating the function within a function.

EmilyBourne commented 6 months ago

I'm not sure why I did json_traverse like that. I wonder if there was some reason... Maybe we can try making it recursive and eliminating the function within a function.

I can open a PR for this

jacobwilliams commented 3 months ago

Current status on this is at: https://github.com/jacobwilliams/json-fortran/pull/549

It doesn't seem like the proposed solution is going to work. Probably a workaround could be created but I'm not sure it can be done without changing the api in some backwards-incompatible way, which I don't really want to do for a compiler bug. Can we report this bug to the nvfortran folks?

EmilyBourne commented 3 months ago

Can we report this bug to the nvfortran folks?

I have already reported the whole repo to the Nvidia guys.

Here is an update for version 23.4 of Nvfortran:

      5 - jf_test_03 (SEGFAULT)
      9 - jf_test_07 (SEGFAULT)
     10 - jf_test_08 (SEGFAULT)
     12 - jf_test_10 (SEGFAULT)
     18 - jf_test_16 (SEGFAULT)
     25 - jf_test_23 (Failed)
     31 - jf_test_29 (SEGFAULT)
     32 - jf_test_30 (Failed)

jf_test_01 is now fixed as is jf_test_14 and jf_test_45, however jf_test_07 which was working previously now appears to be broken

EmilyBourne commented 1 week ago

Here is an update for version 23.4 of Nvfortran:

The same tests (but no extras) are failing with version 24.5

JorgeG94 commented 3 days ago

Here is an update for version 23.4 of Nvfortran:

The same tests (but no extras) are failing with version 24.5

I am interested in building also with nvhpc so I might take a squiz over the weekend