Open Kur0x opened 2 years ago
您好, 我看了您在MICRO‘21上发表的论文
Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture
感觉想法非常精妙。我想复现一下Sanger的性能,但无法测出论文中的性能,请问该如何操作呢?论文中Figure 8
中,SQuAD在GPU FP32的Speedup大约是2.3x左右。请问这个数据是端到端(模型训练吞吐量)的性能提升吗还是其他的? 另外我做了以下操作,和得到的结果。CUDA_VISIBLE_DEVICES=1 python bench_cpu_gpu.py --model_name=bert-base --cuda --seq_len=128 Benchmark result (bert-base, GPU, NTC, 128) [0.29386751681566237, 0.2896691173315048, 0.2874777591228485, 0.28696575939655306, 0.2870271983742714, 0.28714976012706755, 0.29563904136419294, 0.28612 608194351197, 0.2866176012158394, 0.2864230412244797] mean: 0.2886962876915932, std: 0.003188126847889235
python bench_sanger.py --sparsity=0.1 --load_balance=0.7 --seq_len=128 Sanger Latency: 0.235 ms python bench_sanger_v2.py --sparsity=0.1 --load_balance=0.7 --seq_len=128 Sanger Latency: 0.234 ms
请问这样操作有问题吗,是用这2个数据的对比得到性能提升的比率的吗?
谢谢:hugs:
Hello, Have you successfully implemented the repo locally? I tried it with the conda env, but it gave the error saying "Connection error, and we cannot find the requested files in the cached path." ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. With this issue, is there any methods to figure it out?
您好, 我看了您在MICRO‘21上发表的论文
Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture
感觉想法非常精妙。我想复现一下Sanger的性能,但无法测出论文中的性能,请问该如何操作呢?论文中Figure 8
中,SQuAD在GPU FP32的Speedup大约是2.3x左右。请问这个数据是端到端(模型训练吞吐量)的性能提升吗还是其他的? 另外我做了以下操作,和得到的结果。CUDA_VISIBLE_DEVICES=1 python bench_cpu_gpu.py --model_name=bert-base --cuda --seq_len=128 Benchmark result (bert-base, GPU, NTC, 128) [0.29386751681566237, 0.2896691173315048, 0.2874777591228485, 0.28696575939655306, 0.2870271983742714, 0.28714976012706755, 0.29563904136419294, 0.28612 608194351197, 0.2866176012158394, 0.2864230412244797] mean: 0.2886962876915932, std: 0.003188126847889235
python bench_sanger.py --sparsity=0.1 --load_balance=0.7 --seq_len=128 Sanger Latency: 0.235 ms python bench_sanger_v2.py --sparsity=0.1 --load_balance=0.7 --seq_len=128 Sanger Latency: 0.234 ms
请问这样操作有问题吗,是用这2个数据的对比得到性能提升的比率的吗? 谢谢:hugs:
Hello, Have you successfully implemented the repo locally? I tried it with the conda env, but it gave the error saying "Connection error, and we cannot find the requested files in the cached path." ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. With this issue, is there any methods to figure it out?
It seems to be a network error. You can try another network environment, or pinpoint the error code and try to fix it.
您好, 我看了您在MICRO‘21上发表的论文
Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture
感觉想法非常精妙。我想复现一下Sanger的性能,但无法测出论文中的性能,请问该如何操作呢?论文中Figure 8
中,SQuAD在GPU FP32的Speedup大约是2.3x左右。请问这个数据是端到端(模型训练吞吐量)的性能提升吗还是其他的? 另外我做了以下操作,和得到的结果。CUDA_VISIBLE_DEVICES=1 python bench_cpu_gpu.py --model_name=bert-base --cuda --seq_len=128 Benchmark result (bert-base, GPU, NTC, 128) [0.29386751681566237, 0.2896691173315048, 0.2874777591228485, 0.28696575939655306, 0.2870271983742714, 0.28714976012706755, 0.29563904136419294, 0.28612 608194351197, 0.2866176012158394, 0.2864230412244797] mean: 0.2886962876915932, std: 0.003188126847889235
python bench_sanger.py --sparsity=0.1 --load_balance=0.7 --seq_len=128 Sanger Latency: 0.235 ms python bench_sanger_v2.py --sparsity=0.1 --load_balance=0.7 --seq_len=128 Sanger Latency: 0.234 ms
请问这样操作有问题吗,是用这2个数据的对比得到性能提升的比率的吗? 谢谢:hugs:
Hello, Have you successfully implemented the repo locally? I tried it with the conda env, but it gave the error saying "Connection error, and we cannot find the requested files in the cached path." ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. With this issue, is there any methods to figure it out?
It seems to be a network error. You can try another network environment, or pinpoint the error code and try to fix it.
I tried and may need to download the model locally. Do you knoew where can i find the model ? or do you know the name of the model?
您好, 我看了您在MICRO‘21上发表的论文
Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture
感觉想法非常精妙。我想复现一下Sanger的性能,但无法测出论文中的性能,请问该如何操作呢?论文中Figure 8
中,SQuAD在GPU FP32的Speedup大约是2.3x左右。请问这个数据是端到端(模型训练吞吐量)的性能提升吗还是其他的? 另外我做了以下操作,和得到的结果。请问这样操作有问题吗,是用这2个数据的对比得到性能提升的比率的吗?
谢谢:hugs: