msr-fiddle / dejavu

Apache License 2.0
9 stars 2 forks source link

"error: ‘class fastertransformer::ParallelGptDVBenchmark<__nv_bfloat16>’ has no member named ‘comp_done_’" when compilation #2

Closed Zhuohao-Li closed 1 month ago

Zhuohao-Li commented 1 month ago

Hi @fotstrt ,

I encountered some "attribute definition errors" when I follow the installation doc and running cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON -DBUILD_MICROBENCHMARKS=ON .. make -j12.

The exaggarated error messages:

[ 97%] Built target gptj_example In file included from /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.cc:17: /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h: In instantiation of ‘void torch_ext::FTGpt<T>::cleanup() [with T = __nv_bfloat16]’: /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h:347:10: required from here /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h:349:18: error: ‘class fastertransformer::ParallelGptDVBenchmark<__nv_bfloat16>’ has no member named ‘comp_done_’ 349 | gpt_ptr->comp_done_ = true; | ~~~~~~~~~^~~~~~~~~~ /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h:354:18: error: ‘class fastertransformer::ParallelGptDVBenchmark<__nv_bfloat16>’ has no member named ‘reset’ 354 | gpt_ptr->reset(); | ~~~~~~~~~^~~~~ /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h: In instantiation of ‘void torch_ext::FTGpt<T>::cleanup() [with T = __half]’: /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h:347:10: required from here /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h:349:18: error: ‘class fastertransformer::ParallelGptDVBenchmark<__half>’ has no member named ‘comp_done_’ 349 | gpt_ptr->comp_done_ = true; | ~~~~~~~~~^~~~~~~~~~ /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h:354:18: error: ‘class fastertransformer::ParallelGptDVBenchmark<__half>’ has no member named ‘reset’ 354 | gpt_ptr->reset(); | ~~~~~~~~~^~~~~ /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h: In instantiation of ‘void torch_ext::FTGpt<T>::cleanup() [with T = float]’: /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h:347:10: required from here /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h:349:18: error: ‘class fastertransformer::ParallelGptDVBenchmark<float>’ has no member named ‘comp_done_’ 349 | gpt_ptr->comp_done_ = true; | ~~~~~~~~~^~~~~~~~~~ /home/ubuntu/zhuohao-dev-3/dejavu/src/fastertransformer/th_op/multi_gpu_gpt/ParallelGptOp.h:354:18: error: ‘class fastertransformer::ParallelGptDVBenchmark<float>’ has no member named ‘reset’ 354 | gpt_ptr->reset(); | ~~~~~~~~~^~~~~

I first tried on my local machines, the settings (8*A100-40GB, CUDA 12.1, Python 3.8, Pytorch 2.1) are not quite the same but I setup all the dependencies already. I further tried to build with your Dockerfile but it produced the same error.

Can you help to check with that?

----- How to reproduce the error --------- simply follow the installation doc and the error comes from running cmake -DSM=80 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON -DBUILD_MICROBENCHMARKS=ON .. make -j12.

fotstrt commented 1 month ago

Hello,

thank you for your comment! This was indeed an issue of missing a function definition at the specific class. I added it in PR #3 (although this function is not used by this class).

If you can pull again from the master branch, it should compile without issues. Thank you for bringing this to my attention! :)