ucbrise / clipper

A low-latency prediction-serving system
http://clipper.ai
Apache License 2.0
1.4k stars 280 forks source link

Caffe2 Build Issue #475

Closed simon-mo closed 5 years ago

simon-mo commented 6 years ago

Caffe2OnnxContainer is having issue building from source.

Some submodule checkout failed:

error: no such remote ref f3c627d517968c20e8269ead1d90cd3a6c199356
Fetched in submodule path 'third_party/aten', but it did not contain f3c627d517968c20e8269ead1d90cd3a6c199356. Direct fetching of that commit failed.

Jenkins Log: https://amplab.cs.berkeley.edu/jenkins/job/Clipper-PRB/1371/console

Related issue on PyTorch and Caffe2 issue page: https://github.com/caffe2/caffe2/issues/2510 https://github.com/pytorch/pytorch/issues/6776

simon-mo commented 6 years ago

I was able to reproduce it on my local machine:

git clone --recursive https://github.com/caffe2/caffe2

Cloning into 'caffe2'...
remote: Counting objects: 230439, done.
remote: Total 230439 (delta 0), reused 0 (delta 0), pack-reused 230439
Receiving objects: 100% (230439/230439), 393.42 MiB | 667.00 KiB/s, done.
Resolving deltas: 100% (211258/211258), done.
Submodule 'third_party/ComputeLibrary' (https://github.com/ARM-software/ComputeLibrary.git) registered for path 'third_party/ComputeLibrary'
Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16'
Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv'
Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK'
Submodule 'third_party/aten' (https://github.com/zdevito/aten) registered for path 'third_party/aten'
Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark'
Submodule 'third-party/cpuinfo' (https://github.com/Maratyszcza/cpuinfo.git) registered for path 'third_party/cpuinfo'
Submodule 'third_party/cub' (https://github.com/NVlabs/cub.git) registered for path 'third_party/cub'
Submodule 'third_party/eigen' (https://github.com/RLovelett/eigen.git) registered for path 'third_party/eigen'
Submodule 'third_party/gloo' (https://github.com/facebookincubator/gloo) registered for path 'third_party/gloo'
Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest'
Submodule 'third_party/ios-cmake' (https://github.com/Yangqing/ios-cmake.git) registered for path 'third_party/ios-cmake'
Submodule 'third_party/nccl' (https://github.com/nvidia/nccl.git) registered for path 'third_party/nccl'
Submodule 'third_party/nervanagpu' (https://github.com/NervanaSystems/nervanagpu.git) registered for path 'third_party/nervanagpu'
Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx'
Submodule 'third_party/protobuf' (https://github.com/google/protobuf.git) registered for path 'third_party/protobuf'
Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd'
Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool'
Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11'
Submodule 'third_party/python-enum' (https://github.com/PeachPy/enum34.git) registered for path 'third_party/python-enum'
Submodule 'third_party/python-peachpy' (https://github.com/Maratyszcza/PeachPy.git) registered for path 'third_party/python-peachpy'
Submodule 'third_party/python-six' (https://github.com/benjaminp/six.git) registered for path 'third_party/python-six'
Submodule 'third_party/zstd' (https://github.com/facebook/zstd.git) registered for path 'third_party/zstd'
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/ComputeLibrary'...
remote: Counting objects: 51518, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 51518 (delta 0), reused 3 (delta 0), pack-reused 51511
Receiving objects: 100% (51518/51518), 61.92 MiB | 721.00 KiB/s, done.
Resolving deltas: 100% (44900/44900), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/FP16'...
remote: Counting objects: 243, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 243 (delta 0), reused 1 (delta 0), pack-reused 238
Receiving objects: 100% (243/243), 101.65 KiB | 991.00 KiB/s, done.
Resolving deltas: 100% (132/132), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/FXdiv'...
remote: Counting objects: 208, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 208 (delta 2), reused 6 (delta 2), pack-reused 202
Receiving objects: 100% (208/208), 32.90 KiB | 935.00 KiB/s, done.
Resolving deltas: 100% (102/102), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/NNPACK'...
remote: Counting objects: 2679, done.
remote: Compressing objects: 100% (73/73), done.
remote: Total 2679 (delta 103), reused 130 (delta 89), pack-reused 2517
Receiving objects: 100% (2679/2679), 1.00 MiB | 1.59 MiB/s, done.
Resolving deltas: 100% (1783/1783), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/aten'...
remote: Counting objects: 23111, done.
remote: Compressing objects: 100% (1234/1234), done.
remote: Total 23111 (delta 172), reused 1401 (delta 172), pack-reused 21705
Receiving objects: 100% (23111/23111), 6.99 MiB | 3.25 MiB/s, done.
Resolving deltas: 100% (13514/13514), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/benchmark'...
remote: Counting objects: 4392, done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 4392 (delta 2), reused 2 (delta 1), pack-reused 4382
Receiving objects: 100% (4392/4392), 1.33 MiB | 1.53 MiB/s, done.
Resolving deltas: 100% (2860/2860), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/cpuinfo'...
remote: Counting objects: 4547, done.
remote: Compressing objects: 100% (140/140), done.
remote: Total 4547 (delta 213), reused 213 (delta 150), pack-reused 4257
Receiving objects: 100% (4547/4547), 4.34 MiB | 1.66 MiB/s, done.
Resolving deltas: 100% (3426/3426), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/cub'...
remote: Counting objects: 32642, done.
remote: Total 32642 (delta 0), reused 0 (delta 0), pack-reused 32642
Receiving objects: 100% (32642/32642), 16.46 MiB | 1.52 MiB/s, done.
Resolving deltas: 100% (28621/28621), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/eigen'...
remote: Counting objects: 91152, done.
remote: Total 91152 (delta 0), reused 0 (delta 0), pack-reused 91151
Receiving objects: 100% (91152/91152), 73.06 MiB | 1.88 MiB/s, done.
Resolving deltas: 100% (67562/67562), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/gloo'...
remote: Counting objects: 2106, done.
remote: Compressing objects: 100% (40/40), done.
remote: Total 2106 (delta 14), reused 22 (delta 8), pack-reused 2057
Receiving objects: 100% (2106/2106), 636.52 KiB | 1.37 MiB/s, done.
Resolving deltas: 100% (1563/1563), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/googletest'...
remote: Counting objects: 11728, done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 11728 (delta 37), reused 48 (delta 27), pack-reused 11659
Receiving objects: 100% (11728/11728), 3.48 MiB | 2.73 MiB/s, done.
Resolving deltas: 100% (8616/8616), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/ios-cmake'...
remote: Counting objects: 230, done.
remote: Total 230 (delta 0), reused 0 (delta 0), pack-reused 230
Receiving objects: 100% (230/230), 53.31 KiB | 568.00 KiB/s, done.
Resolving deltas: 100% (81/81), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/nccl'...
remote: Counting objects: 651, done.
remote: Total 651 (delta 0), reused 0 (delta 0), pack-reused 651
Receiving objects: 100% (651/651), 1.38 MiB | 1.69 MiB/s, done.
Resolving deltas: 100% (411/411), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/nervanagpu'...
remote: Counting objects: 966, done.
remote: Total 966 (delta 0), reused 0 (delta 0), pack-reused 966
Receiving objects: 100% (966/966), 521.13 KiB | 991.00 KiB/s, done.
Resolving deltas: 100% (722/722), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/onnx'...
remote: Counting objects: 8077, done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 8077 (delta 9), reused 5 (delta 2), pack-reused 8059
Receiving objects: 100% (8077/8077), 3.75 MiB | 2.77 MiB/s, done.
Resolving deltas: 100% (3738/3738), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/protobuf'...
remote: Counting objects: 51713, done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 51713 (delta 7), reused 2 (delta 1), pack-reused 51693
Receiving objects: 100% (51713/51713), 45.62 MiB | 1.59 MiB/s, done.
Resolving deltas: 100% (34754/34754), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/psimd'...
remote: Counting objects: 83, done.
remote: Total 83 (delta 0), reused 0 (delta 0), pack-reused 83
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/pthreadpool'...
remote: Counting objects: 221, done.
remote: Total 221 (delta 0), reused 0 (delta 0), pack-reused 221
Receiving objects: 100% (221/221), 47.63 KiB | 541.00 KiB/s, done.
Resolving deltas: 100% (103/103), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/pybind11'...
remote: Counting objects: 9879, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 9879 (delta 0), reused 1 (delta 0), pack-reused 9876
Receiving objects: 100% (9879/9879), 3.52 MiB | 835.00 KiB/s, done.
Resolving deltas: 100% (6668/6668), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/python-enum'...
remote: Counting objects: 19, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 19 (delta 0), reused 19 (delta 0), pack-reused 0
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/python-peachpy'...
remote: Counting objects: 2000, done.
remote: Total 2000 (delta 0), reused 0 (delta 0), pack-reused 2000
Receiving objects: 100% (2000/2000), 919.08 KiB | 561.00 KiB/s, done.
Resolving deltas: 100% (1403/1403), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/python-six'...
remote: Counting objects: 1654, done.
remote: Total 1654 (delta 0), reused 0 (delta 0), pack-reused 1653
Receiving objects: 100% (1654/1654), 1.65 MiB | 1.27 MiB/s, done.
Resolving deltas: 100% (965/965), done.
Cloning into '/Users/simonmo/Desktop/sandbox/caffe2/third_party/zstd'...
remote: Counting objects: 28568, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 28568 (delta 3), reused 4 (delta 3), pack-reused 28560
Receiving objects: 100% (28568/28568), 13.53 MiB | 689.00 KiB/s, done.
Resolving deltas: 100% (21066/21066), done.
Submodule path 'third_party/ComputeLibrary': checked out '292227986edb37b01061afcad6df18ba9d6ccbeb'
Submodule path 'third_party/FP16': checked out '43d6d17df48ebf622587e7ed9472ea76573799b9'
Submodule path 'third_party/FXdiv': checked out '811b482bcd9e8d98ad80c6c78d5302bb830184b0'
Submodule path 'third_party/NNPACK': checked out '087269189207a63ab7084e6925ea511d8952fa59'
error: Server does not allow request for unadvertised object f3c627d517968c20e8269ead1d90cd3a6c199356
Fetched in submodule path 'third_party/aten', but it did not contain f3c627d517968c20e8269ead1d90cd3a6c199356. Direct fetching of that commit failed.
simon-mo commented 6 years ago

Some builds succeed because docker cached it:

Step 5/11 : RUN git clone --recursive https://github.com/caffe2/caffe2.git && cd caffe2     && git checkout 409e96d9ebd9963e10fd1eaa0a02cfbb8623650e     && git submodule update --init && mkdir build && cd build && cmake .. -DUSE_MPI=OFF     && make -j8 install
 ---> Using cache
 ---> 7fb300a93942
goswamig commented 6 years ago

any update ?

dcrankshaw commented 6 years ago

@simon-mo Did the recent merge of the Caffe2 and Pytorch repos fix this build issue?

simon-mo commented 6 years ago

@dcrankshaw No. The build was successful but there was an error loading some shared library when we import caffe2. The log was buried in jenkins.

My plan was wait for the PyTorch1.0 to come out during the summer (https://pytorch.org/2018/05/02/road-to-1.0.html) and then devote some hours to figure out the best way to install caffe2.

dcrankshaw commented 6 years ago

Sounds like a good plan to me. I'll leave this issue open as a tracking issue in the meantime.

YuchenJin commented 6 years ago

You can do this in your dockerfile to build Caffe2OnnxContainer successfully:

RUN git clone https://github.com/caffe2/caffe2.git

# Workaround: Add aten to caffe2's third_party
RUN cd caffe2/third_party \
    && git clone https://github.com/zdevito/ATen aten \
    && cd aten \
    && git checkout 642baf51c5b7e13ad814542c0b47ab03a14f8c92 \
    && cd .. \
    && git add aten
rkooo567 commented 5 years ago

@simon-mo Is it resolved?

simon-mo commented 5 years ago

We can tag it WONTFIX. Because caffe2 is basically deprecated in favor of PyTorch

On Tue, May 28, 2019 at 8:44 PM SangBin Cho notifications@github.com wrote:

@simon-mo https://github.com/simon-mo Is it resolved?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ucbrise/clipper/issues/475?email_source=notifications&email_token=AFBD7A3ABLGRHFEE27CYIHDPXX3YHA5CNFSM4E3T3NL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWOCI3A#issuecomment-496772204, or mute the thread https://github.com/notifications/unsubscribe-auth/AFBD7A5LL3XMVYSNJ42XBHTPXX3YHANCNFSM4E3T3NLQ .