tensorflow / data-validation

Library for exploring and validating machine learning data
Apache License 2.0
766 stars 174 forks source link

`data-validation` package fails to build/install #258

Open peytondmurray opened 3 months ago

peytondmurray commented 3 months ago

Currently data-validation fails to build. cc @rcrowe-google

System Information

$ uname -rms
Linux 6.6.47-1-lts x86_64

$ python -VV
Python 3.11.9 (main, May 27 2024, 14:06:17) [GCC 14.1.1 20240522]

$ bazel version
Bazelisk version: v1.20.0
Starting local Bazel server and connecting to it...
Build label: 7.3.1
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Aug 19 16:12:50 2024 (1724083970)
Build timestamp: 1724083970
Build timestamp as int: 1724083970

What I tried

Starting from a fresh virtual environment:

$ pip list
Package    Version
---------- -------
pip        24.0
setuptools 65.5.0

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python -m pip install --upgrade pip

Following the instructions in README.md

$ python setup.py bdist_wheel
/home/pdmurray/.pyenv/versions/tfdv/lib/python3.11/site-packages/setuptools/dist.py:530: UserWarning: Normalizing '1.16.0.dev' to '1.16.0.dev0'
  warnings.warn(tmpl.format(**locals()))
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: invalid command 'bdist_wheel'

There's a great blog post about the implications of directly invoking setup.py, but in short we should avoid doing this in the future.

Calling pip install -v .

This seems to work until we get to the point where the compiled extension begins to build:

$ pip install -v .
...
[0 / 1] checking cached actions
  ERROR: /home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/external/com_google_zetasql/zetasql/parser/BUILD:26:8: @@com_google_zetasql//zetasql/parser:gen_extra_files: no such attribute 'exec_tools' in 'genrule' rule (did you mean 'executable'?)
  ERROR: /home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/external/com_google_zetasql/zetasql/parser/BUILD:47:8: @@com_google_zetasql//zetasql/parser:gen_headers: no such attribute 'exec_tools' in 'genrule' rule (did you mean 'executable'?)
  ERROR: /home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/external/com_google_zetasql/zetasql/parser/BUILD:57:8: @@com_google_zetasql//zetasql/parser:gen_protos: no such attribute 'exec_tools' in 'genrule' rule (did you mean 'executable'?)
  ERROR: /home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/external/com_google_zetasql/zetasql/parser/BUILD:67:8: @@com_google_zetasql//zetasql/parser:gen_parse_tree_serializer_cc: no such attribute 'exec_tools' in 'genrule' rule (did you mean 'executable'?)
  ERROR: /home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/external/com_google_zetasql/zetasql/parser/BUILD:77:8: @@com_google_zetasql//zetasql/parser:gen_parse_tree_serializer_headers: no such attribute 'exec_tools' in 'genrule' rule (did you mean 'executable'?)
  ERROR: /home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/external/com_google_zetasql/zetasql/public/BUILD:1571:11: Target '@@com_google_zetasql//zetasql/parser:bison_parser_generated_lib' contains an error and its package is in error and referenced by '@@com_google_zetasql//zetasql/public:parse_helpers'
  ERROR: Analysis of target '//tensorflow_data_validation:move_generated_files' failed; build aborted: Analysis failed
  INFO: Elapsed time: 7.204s, Critical Path: 0.02s
  INFO: 1 process: 1 internal.
  ERROR: Build did NOT complete successfully
  FAILED:
  ERROR: Build failed. Not running target
...

Setting USE_BAZEL_VERSION=6.4.0

After some more investigation, it seems like this happens because of a change that occurred with Bazel 7. Setting USE_BAZEL_VERSION=6.4.0 in .bazeliskrc, I tried again:

  In file included from /usr/include/signal.h:328,
                   from ./signal.h:52,
                   from /home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/sandbox/linux-sandbox/802/execroot/tensorflow_data_validation/external/m4/lib/c-stack.c:49:
  /home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/sandbox/linux-sandbox/802/execroot/tensorflow_data_validation/external/m4/lib/c-stack.c:55:26: error: missing binary operator befor
e token "("
     55 | #elif HAVE_LIBSIGSEGV && SIGSTKSZ < 16384
        |                          ^~~~~~~~
  /home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/sandbox/linux-sandbox/802/execroot/tensorflow_data_validation/external/m4/lib/c-stack.c:107:1: warning: 'die' defined but not used [-Wunused-function]
    107 | die (int signo)
        | ^~~
  make[3]: *** [Makefile:1910: c-stack.o] Error 1
  make[3]: Leaving directory '/home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/sandbox/linux-sandbox/802/execroot/tensorflow_data_validation/bazel-out/k8-opt-exec-2B5CBBC6/bin/external/com_google_zetasql/bazel/m4.build_tmpdir/lib'
  make[2]: *** [Makefile:1674: all] Error 2
  make[2]: Leaving directory '/home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/sandbox/linux-sandbox/802/execroot/tensorflow_data_validation/bazel-out/k8-opt-exec-2B5CBBC6/bin/external/com_google_zetasql/bazel/m4.build_tmpdir/lib'
  make[1]: *** [Makefile:1572: all-recursive] Error 1
  make[1]: Leaving directory '/home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/sandbox/linux-sandbox/802/execroot/tensorflow_data_validation/bazel-out/k8-opt-exec-2B5CBBC6/bin/external/com_google_zetasql/bazel/m4.build_tmpdir'
  make: *** [Makefile:1528: all] Error 2

This seems to be an issue with zetasql; there is an issue open for this, but it does not look likely to get merged: https://github.com/google/zetasql/issues/100.

Using GCC 9

After further discussion it seems like gcc-9 may be needed to build this. I tried setting CC and CXX appropriately, but again ran into another compilation error:

...
  INFO: Analyzed target //tensorflow_data_validation:move_generated_files (124 packages loaded, 8640 targets configured).
  INFO: Found 1 target...
  [0 / 1,001] [Prepa] BazelWorkspaceStatusAction stable-status.txt ... (22 actions, 3 running)
  [97 / 1,109] Compiling src/google/protobuf/wire_format_lite.cc [for tool]; 0s linux-sandbox ... (24 actions, 23 running)
  INFO: From Compiling zetasql/common/initialize_required_fields.cc:
  external/com_google_zetasql/zetasql/common/initialize_required_fields.cc: In function 'bool zetasql::InitializeMissingRequiredFields(google::protobuf::Message*, std::set<std::__cxx11::basic_string<char> >*)':
  external/com_google_zetasql/zetasql/common/initialize_required_fields.cc:108:21: warning: comparison of integer expressions of different signedness: 'int' and 'std::vector<const google::protobuf::FieldDescriptor*>::size_type' {aka 'long unsigned int'} [-Wsign-compare]
    108 |   for (int i = 0; i < extensions.size(); ++i) {
        |                   ~~^~~~~~~~~~~~~~~~~~~
  [183 / 1,109] BootstrapGNUMake external/rules_foreign_cc/toolchains/make [for tool]; 1s linux-sandbox ... (25 actions, 24 running)
  ERROR: /home/pdmurray/.cache/bazel/_bazel_pdmurray/4cb5adb34e7954f63abba3c5db8fc6d7/external/rules_foreign_cc/toolchains/BUILD.bazel:130:10: BootstrapGNUMake external/rules_foreign_cc/toolchains/make [for tool] failed: (Exit 77): bash failed: error executing command (from target @rules_foreign_cc//toolchains:make_tool) /bin/bash -c bazel-out/k8-opt-exec-2B5CBBC6/bin/external/rules_foreign_cc/toolchains/make_tool_foreign_cc/wrapper_build_script.sh
...

This is as far as I've yet gotten.