Hello everyone. I've modified some content on the file called kernel_neon.h , I imitate the original kernel(12x8x2) and wrote a new kernel(8x8x8), and I've run the benchmark, It seems that the results are not very different on my arm64 environment. so i want to use in the machine learning framewok such as tensorflow or mxnet ,etc. I've tried so many methods but failed at last. So is this feasible? anyone can help? Thanks a lot.
@knjwhn
Hello, so which compile have you used, I met some problem "error: impossible constraint in 'asm'“, when I remove all variables in output, it will be ok. I guess maybe it's the proble of compile
Hello everyone. I've modified some content on the file called kernel_neon.h , I imitate the original kernel(12x8x2) and wrote a new kernel(8x8x8), and I've run the benchmark, It seems that the results are not very different on my arm64 environment. so i want to use in the machine learning framewok such as tensorflow or mxnet ,etc. I've tried so many methods but failed at last. So is this feasible? anyone can help? Thanks a lot.