tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
186.49k stars 74.31k forks source link

Switch build system #53647

Open OgreTransporter opened 2 years ago

OgreTransporter commented 2 years ago

System information

Describe the feature and the current behavior/state. Building TensorFlow on Windows is a nightmare. A big problem is that either TensorFlow's Bazel scripts or Bazel itself is quite buggy. Examples:

With Bazel, building TF is a gamble. For example, if you install VS2022 next to VS2019 it doesn't work anymore without workarounds. Another problem with Bazel is that the files are always stored in C:\Users\username\ _bazel_username. You can't easily change the directory to e.g. move the build to a fast SSD or when C is running out of space.

TF has used CMake in the past. A switch back to CMake would be desirable.

Who will benefit with this feature? Anyone who builds TensorFlow on Windows.

old-school-kid commented 2 years ago

Agreed. But since Keras requires to bazel for its build tensorflow had to follow the same. Sadly everything is tested on Linux OSS. So I don't see this being solved in near future.

mihaimaruseac commented 2 years ago

Google uses an equivalent of Bazel internally. We cannot maintain a CMake build, the existing one was always getting out of date and was more broken than actually working.

If an external contributor were to provide CMake support and maintain it, we can take that, but otherwise, sadly we cannot.

OgreTransporter commented 2 years ago

Bazel is just a single disaster. I have now tried to build TF on three computers and it only worked on one. Most users use Windows, so it makes no sense to build everything on Linux. In the long run, a Bazel alternative would be very desirable.

alanpurple commented 2 years ago

windows build is almost impossible these days, I've built tensorflow with cuda so many times before(2016~2021), but now I gave up

SomeoneSerge commented 2 years ago

Hi

I'll also add that it would be significantly easier to package and maintain tensorflow for nixpkgs without Bazel. Running Bazel in isolation is very hard and the maintenance burden is unreasonably high because of the way Bazel handles dependencies and platforms. CMake's find_package/Meson's dependency implement a sort of a dependency injection mechanism and are much easier to hand dependencies to

If an external contributor were to provide CMake support and maintain it, we can take that, but otherwise, sadly we cannot

I.e. you'd be OK merging it?

mihaimaruseac commented 2 years ago

@perfinion is working on unbundling TF's dependencies so you could use those installed on the system. He's doing this for Gentoo, maybe this can also work for nix?

you'd be OK merging it?

Yes, either here or on tensorflow/build. The second is slightly better as we're trying to separate files that are used both internally and externally from files which are only community owned.

aniruthraj commented 1 month ago

Hi,

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The Tensorflow team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TensorFlow version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow space.

github-actions[bot] commented 4 weeks ago

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

SomeoneSerge commented 4 weeks ago

the code/debug information for this issue may not be relevant with the current state of the code base.

I assume this was an automated message? Adding support for a build system that features a proper dependency injection system (meson, cmake) is still... well I'll just phrase it as if it were a fact, but obviously it's an opinion: is of utmost importance for long-term sustainability of TF and for making the research of today less irreproducible. Frankly speaking, I'm suspicious it might not even be possible today to build tensorflow starting from bootstrap (happy to be wrong).

you'd be OK merging it?

Yes, either here or on tensorflow/build. The second is slightly better as we're trying to separate files that are used both internally and externally from files which are only community owned.

I'm hoping this road is still open. I haven't been able to allocate enough spare time so far, but am semi-actively looking for other ways to make this happen

mihaimaruseac commented 4 weeks ago

Yeah, it was kinda automated to match some metrics.

Note that since 2022 the team has changed, there is no one left that was working on TF before the pandemic that is there now. This also means the build organization has changed.