hatsu3 / Sanger

37 stars 9 forks source link

性能复现请教 #1

Open Kur0x opened 2 years ago

Kur0x commented 2 years ago

您好, 我看了您在MICRO‘21上发表的论文Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture感觉想法非常精妙。我想复现一下Sanger的性能,但无法测出论文中的性能,请问该如何操作呢?论文中Figure 8中,SQuAD在GPU FP32的Speedup大约是2.3x左右。请问这个数据是端到端(模型训练吞吐量)的性能提升吗还是其他的? 另外我做了以下操作,和得到的结果。

CUDA_VISIBLE_DEVICES=1 python bench_cpu_gpu.py --model_name=bert-base --cuda  --seq_len=128
Benchmark result (bert-base, GPU, NTC, 128)
[0.29386751681566237, 0.2896691173315048, 0.2874777591228485, 0.28696575939655306, 0.2870271983742714, 0.28714976012706755, 0.29563904136419294, 0.28612
608194351197, 0.2866176012158394, 0.2864230412244797]
mean: 0.2886962876915932, std: 0.003188126847889235
python bench_sanger.py --sparsity=0.1 --load_balance=0.7 --seq_len=128 
Sanger Latency: 0.235 ms
python bench_sanger_v2.py --sparsity=0.1 --load_balance=0.7 --seq_len=128
Sanger Latency: 0.234 ms

请问这样操作有问题吗,是用这2个数据的对比得到性能提升的比率的吗?

谢谢:hugs:

jimmy-adams commented 8 months ago

您好, 我看了您在MICRO‘21上发表的论文Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture感觉想法非常精妙。我想复现一下Sanger的性能,但无法测出论文中的性能,请问该如何操作呢?论文中Figure 8中,SQuAD在GPU FP32的Speedup大约是2.3x左右。请问这个数据是端到端(模型训练吞吐量)的性能提升吗还是其他的? 另外我做了以下操作,和得到的结果。

CUDA_VISIBLE_DEVICES=1 python bench_cpu_gpu.py --model_name=bert-base --cuda  --seq_len=128
Benchmark result (bert-base, GPU, NTC, 128)
[0.29386751681566237, 0.2896691173315048, 0.2874777591228485, 0.28696575939655306, 0.2870271983742714, 0.28714976012706755, 0.29563904136419294, 0.28612
608194351197, 0.2866176012158394, 0.2864230412244797]
mean: 0.2886962876915932, std: 0.003188126847889235
python bench_sanger.py --sparsity=0.1 --load_balance=0.7 --seq_len=128 
Sanger Latency: 0.235 ms
python bench_sanger_v2.py --sparsity=0.1 --load_balance=0.7 --seq_len=128
Sanger Latency: 0.234 ms

请问这样操作有问题吗,是用这2个数据的对比得到性能提升的比率的吗?

谢谢:hugs:

Hello, Have you successfully implemented the repo locally? I tried it with the conda env, but it gave the error saying "Connection error, and we cannot find the requested files in the cached path." ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. With this issue, is there any methods to figure it out?

Kur0x commented 8 months ago

您好, 我看了您在MICRO‘21上发表的论文Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture感觉想法非常精妙。我想复现一下Sanger的性能,但无法测出论文中的性能,请问该如何操作呢?论文中Figure 8中,SQuAD在GPU FP32的Speedup大约是2.3x左右。请问这个数据是端到端(模型训练吞吐量)的性能提升吗还是其他的? 另外我做了以下操作,和得到的结果。

CUDA_VISIBLE_DEVICES=1 python bench_cpu_gpu.py --model_name=bert-base --cuda  --seq_len=128
Benchmark result (bert-base, GPU, NTC, 128)
[0.29386751681566237, 0.2896691173315048, 0.2874777591228485, 0.28696575939655306, 0.2870271983742714, 0.28714976012706755, 0.29563904136419294, 0.28612
608194351197, 0.2866176012158394, 0.2864230412244797]
mean: 0.2886962876915932, std: 0.003188126847889235
python bench_sanger.py --sparsity=0.1 --load_balance=0.7 --seq_len=128 
Sanger Latency: 0.235 ms
python bench_sanger_v2.py --sparsity=0.1 --load_balance=0.7 --seq_len=128
Sanger Latency: 0.234 ms

请问这样操作有问题吗,是用这2个数据的对比得到性能提升的比率的吗? 谢谢:hugs:

Hello, Have you successfully implemented the repo locally? I tried it with the conda env, but it gave the error saying "Connection error, and we cannot find the requested files in the cached path." ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. With this issue, is there any methods to figure it out?

It seems to be a network error. You can try another network environment, or pinpoint the error code and try to fix it.

jimmy-adams commented 8 months ago

您好, 我看了您在MICRO‘21上发表的论文Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture感觉想法非常精妙。我想复现一下Sanger的性能,但无法测出论文中的性能,请问该如何操作呢?论文中Figure 8中,SQuAD在GPU FP32的Speedup大约是2.3x左右。请问这个数据是端到端(模型训练吞吐量)的性能提升吗还是其他的? 另外我做了以下操作,和得到的结果。

CUDA_VISIBLE_DEVICES=1 python bench_cpu_gpu.py --model_name=bert-base --cuda  --seq_len=128
Benchmark result (bert-base, GPU, NTC, 128)
[0.29386751681566237, 0.2896691173315048, 0.2874777591228485, 0.28696575939655306, 0.2870271983742714, 0.28714976012706755, 0.29563904136419294, 0.28612
608194351197, 0.2866176012158394, 0.2864230412244797]
mean: 0.2886962876915932, std: 0.003188126847889235
python bench_sanger.py --sparsity=0.1 --load_balance=0.7 --seq_len=128 
Sanger Latency: 0.235 ms
python bench_sanger_v2.py --sparsity=0.1 --load_balance=0.7 --seq_len=128
Sanger Latency: 0.234 ms

请问这样操作有问题吗,是用这2个数据的对比得到性能提升的比率的吗? 谢谢:hugs:

Hello, Have you successfully implemented the repo locally? I tried it with the conda env, but it gave the error saying "Connection error, and we cannot find the requested files in the cached path." ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. With this issue, is there any methods to figure it out?

It seems to be a network error. You can try another network environment, or pinpoint the error code and try to fix it.

I tried and may need to download the model locally. Do you knoew where can i find the model ? or do you know the name of the model?