tensorflow / build

Build-related tools for TensorFlow
Apache License 2.0
282 stars 118 forks source link

Bazel build doesn't check cache actions for source code build #111

Closed AmosChenYQ closed 2 years ago

AmosChenYQ commented 2 years ago

I followed the official documentation and compiled the source code successfully in my own PC, but each time I added any VLOG or some other small code changes and then rebuilt, Bazel didn't seem to use any cache of actions resulting in very long compilation time. Most of time is spent at recompiling source or dependency files like llvm, files under tensorflow/compiler/xla etc, which haven't been changed by me at all.

But I also tried docker image build method mentioned in docs above, in docker container provided by docs Bazel can use cache to do incremental build.

So how do I check or set Bazel configuration in my PC to do incremental build and speed up compilation?

AmosChenYQ commented 2 years ago

I searched for methods these days to solve this issue, this may be the reason?

bhack commented 2 years ago

I searched for methods these days to solve this issue, this may be the reason?

If this happens when you rebase/merge/pull upstream -> yes.

bhack commented 2 years ago

Check also: https://github.com/tensorflow/build/issues/5 https://github.com/tensorflow/build/pull/48

AmosChenYQ commented 2 years ago

I searched for methods these days to solve this issue, this may be the reason?

If this happens when you rebase/merge/pull upstream -> yes.

Thanks for replying. I did update with stream in the very beginning but now I don't do that. The point you mentioned here is the behavior I think bazel should have. But my situation is a bit strange here. I have two machines, one can use cache while the other can't.

One machine is a public server which is being used by my classmates in lab and I use docker to separate my tensorflow development environment from that server. And the image I use is tensorflow/tensorflow:devel-gpu. I build source in docker container's bash and commit this container so the next time I can save time by using this newly-committed image with build cache. This is OK and convenient.

The problem happens in my local machine. I don't use docker in my machine to separate environment and won't update source repo but just modify a few lines of code then build. Bazel uses cache just after building it but if I do it again after a few days it can't do this but build from source again.(My local machine is never shutdown or restart in these days and nor do I delete or modify bazel's cache folder like ~/.cache/bazel and bazel-in/out/bin ) I think there must be some folders being deleted during this time but just can't figure them out.

bhack commented 2 years ago

Do you have something that Is changing your PATH?

If you are on Linux check It with printenv.

AmosChenYQ commented 2 years ago

Do you have something that Is changing your PATH?

If you are on Linux check It with printenv.

SHELL=/bin/bash
LANGUAGE=en_US:en
LC_ADDRESS=en_US.UTF-8
LC_NAME=en_US.UTF-8
TF_CPP_MIN_LOG_LEVEL=0
LC_MONETARY=en_US.UTF-8
TF_FORCE_GPU_ALLOW_GROWTH=true
PWD=/home/amoschenyq
LOGNAME=amoschenyq
XDG_SESSION_TYPE=tty
TF_CPP_MAX_VLOG_LEVEL=1
MOTD_SHOWN=pam
HOME=/home/amoschenyq
LC_PAPER=en_US.UTF-8
LANG=en_US.UTF-8
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
LESSCLOSE=/usr/bin/lesspipe %s %s
XDG_SESSION_CLASS=user
LC_IDENTIFICATION=en_US.UTF-8
TERM=xterm-256color
LESSOPEN=| /usr/bin/lesspipe %s
USER=amoschenyq
SHLVL=0
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
XDG_SESSION_ID=1491
PAPERSIZE=letter
LD_LIBRARY_PATH=/usr/local/TensorRT-8.4.0.6/lib:/usr/local/cuda-11.6/lib64:
XDG_RUNTIME_DIR=/run/user/1000
LC_TIME=en_US.UTF-8
XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
TMP=/mnt/hard-disk/tmp
PATH=/home/amoschenyq/.local/bin:/mnt/hard-disk/usr/local/bin:/home/amoschenyq/.vim/plugged/fzf/bin:/home/amoschenyq/.bazel/bin:/usr/local/TensorRT-8.4.0.6/bin:/usr/local/cuda-11.6/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
LC_NUMERIC=en_US.UTF-8
_=/usr/bin/printenv

Maybe because I have changed TMP folder to my HDD instead of SSD to save space?

bhack commented 2 years ago

No, TEMP Is not involved you need only check if an action-env env var (like PATH) changed between builds:

https://github.com/tensorflow/tensorflow/blob/master/.bazelrc#L158

AmosChenYQ commented 2 years ago

No, TEMP Is not involved you need only check if an action-env env var (like PATH) changed between builds:

https://github.com/tensorflow/tensorflow/blob/master/.bazelrc#L158

I did change my PATH some days ago to add my LLVM/MLIR to path... so I think this strange behavior has a clear reason and this issue is solved!