Closed pradghos closed 5 years ago
This looks like a legitimate issue.
cudf::bool8 w4{-42};
is rightfully complaining about trying to initialize with a larger type (narrowing conversion). We treat warnings as errors, so I'm not sure how that warning slipped past CI.
This could be fixed a few ways:
cudf::bool8 w4 = -42; // cast is implicit
cudf::bool8 w4{static_cast<bool8::value_type>(-42)}; // cast is explicit
However, none of that should affect the correctness of the tests, so those failures are concerning. What OS and gcc versions are you using?
Its is using GCC 7.3.0
conda tool chain on Ubuntu 18.04 (ppc64le)
I can see another warning during work/cpp/src/utilities/device_atomics.cuh(955): warning: invalid narrowing conversion from "signed char" to "char"
But its not stopping us.
@jrhemstad Thanks for your suggestion - I am able to build successfully.
However, test case is still failing.
20/29 Test #20: TYPES_TEST .......................***Failed 2.42 sec
Code change:-
diff --git a/cpp/tests/types/types_test.cpp b/cpp/tests/types/types_test.cpp
index 95e793ea..59075fc2 100644
--- a/cpp/tests/types/types_test.cpp
+++ b/cpp/tests/types/types_test.cpp
@@ -240,7 +240,7 @@ TEST_F(WrappersTestBool8, BooleanOperatorsBool8)
EXPECT_TRUE(w2 >= w3);
EXPECT_TRUE(w2 <= w3);
- cudf::bool8 w4{-42};
+ cudf::bool8 w4{static_cast<cudf::bool8::value_type>(-42)};
cudf::bool8 w5{43};
EXPECT_TRUE(w4 == w4);
@@ -273,10 +273,10 @@ TEST_F(WrappersTestBool8, BooleanOperatorsBool8)
TEST_F(WrappersTestBool8, CastArithmeticTest)
{
cudf::bool8 w1{42};
- cudf::bool8 w2{-42};
+ cudf::bool8 w2{static_cast<cudf::bool8::value_type>(-42)};
bool t1{42};
is there any PR going to include this change or I will create a separate PR to update fix. Please let me know. Thanks!
Summery of the failure for TYPES_TEST
[ OK ] DispatcherTest.UnsuportedTypesTest (0 ms)
[----------] 4 tests from DispatcherTest (1 ms total)
[----------] Global test environment tear-down
[==========] 58 tests from 24 test cases ran. (30400 ms total)
[ PASSED ] 54 tests.
[ FAILED ] 4 tests, listed below:
[ FAILED ] WrappersTest/0.CompoundAssignmentOperators, where TypeParam = cudf::detail::wrapper<int, (gdf_dtype)11>
[ FAILED ] WrappersTest/1.CompoundAssignmentOperators, where TypeParam = cudf::detail::wrapper<long, (gdf_dtype)10>
[ FAILED ] WrappersTest/2.CompoundAssignmentOperators, where TypeParam = cudf::detail::wrapper<int, (gdf_dtype)8>
[ FAILED ] WrappersTest/3.CompoundAssignmentOperators, where TypeParam = cudf::detail::wrapper<long, (gdf_dtype)9>
4 FAILED TESTS
is there any PR going to include this change or I will create a separate PR to update fix. Please let me know. Thanks!
Hopefully @harrism will be able to take a look at why these tests are failing and include the fix for the narrowing warning along with whatever is required to fix the test failures.
Hi @pradghos thanks for the report. I'm working on this, since I wrote the test recently that is failing. I need to fire up my Ubuntu 18 machine to test the failures. I'm not able to repro on Ubuntu 16 with g++ 5.4
@pradghos I'm unable to reproduce this on an Intel CPU with Ubuntu 18.04 GCC 7.4.
It compiles fine as-is, and the tests all pass. Suspect this is an issue on Power only?
@harrism : Thanks for looking into the issue. I will try to take look further. is this feature and test case added recently. could you please point me the PR or commit id? That would be great.
@harrism : Thanks for looking into the issue. I will try to take look further. is this feature and test case added recently. could you please point me the PR or commit id? That would be great.
The particular tests that are failing for you have existed for a while, but they were modified slightly in 0ba835b0b73cbe3284a24dfc3d7fca9691004e7b which was introduced by https://github.com/rapidsai/cudf/pull/1142
Specifically:
EXPECT_EQ(t0, w0.value);
was changed to:
EXPECT_EQ(t0, static_cast<UnderlyingType>(w0.value));
The purpose of these tests is verify the correct functionality of our wrapper
types. See https://github.com/rapidsai/cudf/blob/branch-0.7/cpp/src/utilities/wrapper_types.hpp#L31
These wrappers are effectively a "strong" type definition for a new type that works just like a scalar, and implements the usual operators you'd expect for a scalar. Each wrapper
wraps a single element of a fundamental type, e.g., an int
or a double
.
These particular tests are failing for you verify that operator+=
works as expected. I can't think of a reason why adding these casts would cause the tests to fail.
The test fails for me when I try to build CUDF inside of a nightly devel image using a modified build.sh and environment to work with ASV benchmarking tests.
The simplest reproduction I could come up with...
Inside of nvidia-docker run -ti rapidsai/rapidsai-nightly:cuda10.0-devel-ubuntu18.04 bash
Clone a clean copy of cudf( the one at /rapids/cudf would probably work as well)
some shenanigans to make ci/gpu/build.sh
work outside of gpuci
export WORKSPACE=$(pwd)
sed -i.bak 's/ gdf/ rapids/' ci/gpu/build.sh
sed -i.bak 's/PATH=\([^$]*\):\$PATH/PATH=$PATH:\1/' ci/gpu/build.sh
sed -i.bak 's/export CUDA_REL=${CUDA_VERSION%.*}/export CUDA_REL=${CUDA_VERSION%}/' ci/gpu/build.sh
conda install -y -c rapidsai/label/cuda${CUDA_VERSION} cudatoolkit>=${CUDA_VERSION}.*
Then run the build
ci/gpu/build.sh
@kevingerman did the build fail for you , or just the TYPES_TEST? I was not able to reproduce the build failure in a fresh build yesterday. Will try again.
After second sed command I get
sed: -e expression #1, char 35: invalid reference \1 on `s' command's RHS
No speaky sed
I looked into this. @jrhemstad I'm stumped, can you have look at the following? On the failing test built with GCC 7.x, This is the line that fails, and only for the wrapper types other than cudf::bool8 (e.g. datetime64):
If I remove the static_cast<UnderlyingType>()
, it passes:
EXPECT_EQ(t0, w0.value);
I don't understand this. UnderlyingType
should be TypeParam::value_type
, shouldn't it?
Apologies for the obscure commands. Those are pasted from my history. Probably some cut and paste mangling happened.
First sed command is changing the default conda environment from gdf to rapids
The second sed command is postfixing, rather than prefixing, the PATH with /conda/bin and /usr/local/cuda/bin so that the python used to build CUDF is from the intended conda env, rather than the default shipped with conda.
The third sed command is stripping the “.*” off the back of the env var CUDA_REL so that when that var is used later in the script to install NVStrings and RMM from the nightly repo it finds the intended repo.
These are all just to get the script to work in a nightly devel image. From there, and with the correct toolkit for that image, installed in the current conda env the build script exits at that unit test. It is the only unit test to fail in this scenario.
I looked into this. @jrhemstad I'm stumped, can you have look at the following? On the failing test built with GCC 7.x, This is the line that fails, and only for the wrapper types other than cudf::bool8 (e.g. datetime64):
If I remove the
static_cast<UnderlyingType>()
, it passes:EXPECT_EQ(t0, w0.value);
I don't understand this.
UnderlyingType
should beTypeParam::value_type
, shouldn't it?
This looks totally fine to me.
I don't have the means to look into it further since I can't reproduce the issue.
So if it works for most build configurations (nightly, developer machines...) It might be that myself and op have an issue in our build environment that is creating the issue. Though it seems like something else should fail if its a version mismatch in libraries and or tools.
Okay, thanks to help from @kevingerman I'm able to reproduce on the cuda9.2-devel-ubuntu18.04
container. I'll start diving in to see what the issue is.
@pradghos I assume since you're running on POWER that you're running on bare metal, correct?
Been digging into this all day and haven't made much progress.
To get a reproducible and compact failure, I've been running:
./gtests/TYPES_TEST --gtest_break_on_failure
Which yields:
...
[ RUN ] WrappersTest/0.CompoundAssignmentOperators
/rapids/cudf/cpp/tests/types/types_test.cpp:319: Failure
Expected: t0
Which is: 3
To be equal to: static_cast<UnderlyingType>(w0.value)
Which is: 209652396
Here is the failure: https://github.com/rapidsai/cudf/blob/branch-0.7/cpp/tests/types/types_test.cpp#L313
This is the concise version of the code in question:
UnderlyingType t0{this->rand()};
UnderlyingType const t1{this->rand()};
TypeParam w0{t0};
TypeParam const w1{t1};
if( 0 != t1) {
t0/=t1;
w0/=w1;
EXPECT_EQ(t0, static_cast<UnderlyingType>(w0.value));
}
Printing the values of t0
and t1
shows:
t0 == 1209609620
t1 == 398764591
The correct result of integer division of t0/t1
is 3
. However, the result we're seeing is: 209652396
w0/=w1
is invoking:
https://github.com/rapidsai/cudf/blob/branch-0.7/cpp/src/utilities/wrapper_types.hpp#L181
I added print statements inside this function, and both lhs.value
and rhs.value
match what is expected, but the result is erroneous: 209652396
.
At my wits end, I created an isolated version of the code that is currently failing.
See: https://wandbox.org/permlink/WHMg29N799OPxBgq
I can't reproduce no matter what compiler version I choose.
I also tried compiling that code in the cudf_dev
environment on cuda9.2-devel-ubuntu18.04
with both g++7.3 and nvcc9.2.
I am as of yet unable to reproduce outside of running the TYPES_TEST
directly in the cuda9.2-devel-ubuntu18.04
container.
Oh, and using a DEBUG build makes all of the TYPES_TEST
tests pass.
Note that @rlratzel is seeing the same behavior in the CentOS nightly build using gcc7.2.
I tried the master branch in the container and found that TYPES_TEST did not fail in the previous release. So I'm currently bisecting to find the point of failure (likely in my GDF_BOOL8 PR, but it's not clear how yet).
This is the commit where I believe I introduced the failure. https://github.com/rapidsai/cudf/pull/1142/commits/0ba835b0b73cbe3284a24dfc3d7fca9691004e7b
Here I added the "TypeToUse" template helper for the tests, and also the static_cast<UnderlyingType>
in the CompoundAssignmentTest. That cast is required in order to make the test pass for the cudf::bool8 wrapper. I don't know why the cast makes it fail for other wrapper types (and only on gcc 7.x).
I fixed the failure by doing a different comparison for bool8 and the other wrapper types. https://github.com/rapidsai/cudf/pull/1631
@eyalroz reports the same failure on gcc6.3 https://github.com/rapidsai/cudf/issues/1623#issuecomment-489303657
@jrhemstad : Actually, this may not be the same bug as mine - since I don't get warnings about narrowing conversions when building TYPES_TEST.
The test failures were not related to the narrowing conversions. Nobody has repro'd the compiler warnings for narrowing conversions. I think your test failures were the same, @eyalroz, try the latest branch-0.7 for a fix.
From: Eyal Rozenberg notifications@github.com Sent: Sunday, May 5, 2019 12:17:10 AM To: rapidsai/cudf Cc: Mark Harris; Mention Subject: Re: [rapidsai/cudf] [BUG] TYPES_TEST failure with gcc7.3 (#1544)
@jrhemstadhttps://github.com/jrhemstad : Actually, this may not be the same bug as mine - since I don't get warnings about narrowing conversions when building TYPES_TEST.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/rapidsai/cudf/issues/1544#issuecomment-489330931, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAF7FXPIHAYSUU6O4SUVZPLPTWLGNANCNFSM4HJL5GWA.
Thank you looking into failures. Now I have picked up the latest release 0.7.1
and TYPE_TEST
worked fine for me.
However, I did below changes to avoid error: narrowing conversion of '-42' from 'int' to 'char' inside
during build which was not reproducible in other environment.
Start 21: TYPES_TEST
21/31 Test #21: TYPES_TEST ....................... Passed 1.66 sec
- cudf::bool8 w4{-42};
+ cudf::bool8 w4{static_cast<cudf::bool8::value_type>(-42)};
cudf::bool8 w5{43};
EXPECT_TRUE(w4 == w4);
@@ -273,7 +273,7 @@ TEST_F(WrappersTestBool8, BooleanOperatorsBool8)
TEST_F(WrappersTestBool8, CastArithmeticTest)
{
cudf::bool8 w1{42};
- cudf::bool8 w2{-42};
+ cudf::bool8 w2{static_cast<cudf::bool8::value_type>(-42)};
@jrhemstad @harrism : Please me know if I can create PR for this change. Thank you !
However, I did below changes to avoid error: narrowing conversion of '-42' from 'int' to 'char' inside during build which was not reproducible in other environment.
I believe I know the reason for the narrowing error.
This is because GDF_BOOL8
uses a char
for it's underlying storage. char
resolves to a signed char
on x86
whereas it's unsigned char
on POWER. We should be more explicitly setting the type to one or the other to avoid these issues.
Thanks for the response @jrhemstad @harrism . I will update any other similar warning in - https://github.com/rapidsai/cudf/issues/1751
During Build, I have observed build failure with commit -
c81152e424871b5f00a777d6bca6ccde3301f536
Build log snippet:I have added
-Wno-narrowing
incpp/CMakeLists.txt
and able to build with the change. However I can see test failures -Detailed Log for the failures-
Please suggest if this is already a known issue. Thank you !