Open ysh329 opened 4 years ago
Arm82Convolution.cpp
;Arm82ConvolutionDepthwise.cpp
。REGISTER_ARM82_OP_CREATOR(OpType_Convolution, Arm82ConvolutionCreator);
REGISTER_ARM82_OP_CREATOR(OpType_ConvolutionDepthwise, Arm82ConvolutionDepthwiseCreator);
armv8.2的cpu针对OpType_Convolution有两种方法实现:
Arm82Convolution3x3.cpp
;Arm82Convolution.cpp
里的class Arm82Convolution
所实现的。实现位于Arm82Convolution3x3.cpp
,其实现为Winograd。
kernelTransform_wino_4x4_3x3
/ sourceTransform_wino_4x4_3x3
/ dstTransform_wino_4x4_3x3
);source/backend/arm82/asm/arm64/
,包括不限于:
MNN的每个kernel实现上都有最基本的四个方法,如class Arm82Convolution3x3 : public Execution
:
Arm82ConvolutionDepthwise.cpp
Arm82ConvolutionDepthwise的构造过程中有个方法MNNQuantizeFP16
对权重做FP16的转换处理,其实现有多种实现,包括汇编实现,位于asm/arm64/MNNQuantizeFP16_UNIT4.S
OnResize里定义且实现,但没有调用的两个方法runBasic
和mThreadFunction
,分别会调用以下方法:
MNNLineDepthWiseFp16C8Unit
,不仅有纯C++实现(当没有开启时,即#ifndef MNN_USE_NEON
,应该是为了做验证),也有汇编实现位于asm/arm64/MNNLineDepthWiseFp16C8Unit.S
;MNNDepthWiseFp16C8Unit
,仅有一种实现,基于C++和intrinsic混合实现,如fp16的vld1q_f16
、vfmaq_f16
等。onExecute的计算会调用mThreadFunction
方法,mThreadFunction
又会调用runBaisc
,来做计算。
Arm8.2的主要优势FP16 extensions和Dot Product可以分别应用于浮点计算加速和量化计算加速。
MNN应该是针对上面两个写了对应的汇编实现,对应MNNLineDepthWiseFp16C8Unit和asm/arm64/MNNQuantizeFP16_UNIT4.S
ARM CPU FP16 Targets · Issue #704 · ARM-software/ComputeLibrary https://github.com/ARM-software/ComputeLibrary/issues/704
armv8.2的cpu fp16实现卷积位于ComputeLibrary/NEConvolutionLayer.h
,声明代码见arm_compute/runtime/NEON/functions/NEConvolutionLayer.h。
见部分注释对某个方法input的描述
/** Set the input and output tensors.
*
* @param[in] input Source tensor. 3 lower dimensions represent a single input [width, height, IFM],
* while every optional dimension from 4 and above represent a batch of inputs.
* Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32.
实现代码见[src/runtime/NEON/functions/NEConvolutionLayer.cpp
]()
在其构建脚本里也能看出端倪SConstruct:
# Add architecture specific flags
prefix = ""
if 'v7a' in env['arch']:
env.Append(CXXFLAGS = ['-march=armv7-a', '-mthumb', '-mfpu=neon'])
if env['os'] == 'android':
env.Append(CXXFLAGS = ['-mfloat-abi=softfp'])
else:
env.Append(CXXFLAGS = ['-mfloat-abi=hard'])
elif 'v8' in env['arch']:
if 'sve' in env['arch']:
env.Append(CXXFLAGS = ['-march=armv8.2-a+sve+fp16+dotprod'])
elif 'v8.2-a' in env['arch']:
env.Append(CXXFLAGS = ['-march=armv8.2-a+fp16']) # explicitly enable fp16 extension otherwise __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is undefined
else:
src/runtime/NEON/functions/assembly/NEDepthwiseConvolutionAssemblyDispatch.cpp
:
#ifdef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
std::unique_ptr<depthwise::IDepthwiseConvolution> get_fp16_convolver(int kernel_size, int stride_x,
int n_batches, int in_rows, int in_cols, int n_channels,
int dilation_factor, neon_convolution_kernels::ActivationFunction activation,
int padding_top, int padding_left, int padding_bottom, int padding_right)
{
switch(kernel_size)
{
case 3:
{
switch(stride_x)
{
case 1:
return arm_compute::support::cpp14::make_unique<depthwise::DilatedDepthwiseConvolution<3, 3, 3, 3, 1, 1, float16_t, float16_t, float16_t>>(
n_batches, in_rows, in_cols, n_channels, dilation_factor, activation, padding_top, padding_left, padding_bottom, padding_right);
case 2:
return arm_compute::support::cpp14::make_unique<depthwise::DilatedDepthwiseConvolution<3, 3, 3, 3, 2, 2, float16_t, float16_t, float16_t>>(
n_batches, in_rows, in_cols, n_channels, dilation_factor, activation, padding_top, padding_left, padding_bottom, padding_right);
default:
return nullptr;
}
}
case 5:
{
switch(stride_x)
{
case 1:
return arm_compute::support::cpp14::make_unique<depthwise::DilatedDepthwiseConvolution<3, 3, 5, 5, 1, 1, float16_t, float16_t, float16_t>>(
n_batches, in_rows, in_cols, n_channels, dilation_factor, activation, padding_top, padding_left, padding_bottom, padding_right);
case 2:
return arm_compute::support::cpp14::make_unique<depthwise::DilatedDepthwiseConvolution<3, 3, 5, 5, 2, 2, float16_t, float16_t, float16_t>>(
n_batches, in_rows, in_cols, n_channels, dilation_factor, activation, padding_top, padding_left, padding_bottom, padding_right);
default:
return nullptr;
}
}
default:
return nullptr;
}
}
#endif // __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
armv8,acl和mnn
conv1x1
偶然看到在src/runtime/NEON/functions/NEConvolution.cpp
这个文件有include一个文件,名为arm_compute/core/NEON/kernels/NEConvolutionKernel.h
,
版本:0df31a8667bdfdbdea084eef43b6812897e75db9,release 1.0.0 日期:Thu May 7 18:19:02 2020
Arm82Convolution.hpp
其中Arm82Convolution.cpp及其hpp包含了Conv的注册和方法,简单摘一下主要方法,实现省略:
Arm82Convolution.cpp