ysh329 commented 7 years ago

Merge

~~embedded_ai/2017-08-07.md at bi-weekly-2017-08-07 · PerfXLab/embedded_ai https://github.com/PerfXLab/embedded_ai/blob/bi-weekly-2017-08-07/bi-weekly-reports/2017-08-07.md~~

Company

~~Lift: A novel approach to achieving performance portability on parallel accelerators. | Where High-Level Programming Meets Performance Portability~~

~~mlmodelzoo.com – deep learning models on mobile~~

Other list

ZhishengWang/Embedded-Neural-Network: collection of works aiming at reducing model sizes or the ASIC/FPGA accelerator for machine learning https://github.com/ZhishengWang/Embedded-Neural-Network

Neural-Networks-on-Silicon/README.md at master · fengbintu/Neural-Networks-on-Silicon https://github.com/fengbintu/Neural-Networks-on-Silicon/blob/master/README.md

ysh329 commented 7 years ago

理解 product quantization 算法 http://vividfree.github.io/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/2017/08/05/understanding-product-quantization

Building a Facial Recognition Pipeline with Deep Learning in Tensorflow https://hackernoon.com/building-a-facial-recognition-pipeline-with-deep-learning-in-tensorflow-66e7645015b8

ysh329 commented 7 years ago

doonny/PipeCNN: An OpenCL-based FPGA Accelerator for Convolutinal Neural Networks https://github.com/doonny/PipeCNN

soumith/convnet-benchmarks: Easy benchmarking of all publicly accessible implementations of convnets https://github.com/soumith/convnet-benchmarks

ysh329 commented 7 years ago

百度NLP | 神经网络模型压缩技术 | 机器之心 https://www.jiqizhixin.com/articles/e70180f5-4274-4855-b749-23be41c800d7

ysh329 commented 7 years ago

OAID/YSQfastfd: A fast binary library for face detection and face landmark detection in images. No float point operations, especially suit for low cost ARM CPUs, The highest accuracy on FDDB among non deep learning methods https://github.com/OAID/YSQfastfd

ysh329 commented 7 years ago

News

blog

内存带宽与计算能力，谁才是决定深度学习执行性能的关键？ | 机器之心
简评：

深度学习神经网络训练技巧（by 李宏毅）（国语）_演讲•公开课_科技_bilibili_哔哩哔哩 https://www.bilibili.com/video/av14293926/

paper

[1709.02043] The Mating Rituals of Deep Neural Networks: Learning Compact Feature Representations through Sexual Evolutionary Synthesis
简评：
[1709.00643] Fast Image Processing with Fully-Convolutional Networks
简评：
[1709.01041] Domain-adaptive deep network compression
简评：
[1709.01427] Stochastic Gradient Descent: Going As Fast As Possible But Not Faster
简评：
[1709.00584] Deep Learning-Guided Image Reconstruction from Incomplete Data
简评：
[1709.02755] Training RNNs as Fast as CNNs
简评：
[1709.02260] Embedded Binarized Neural Networks
简评：
[1709.01921] Distributed Deep Neural Networks over the Cloud, the Edge and End Devices
简评：
[1609.09671] Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks
简评：
[1709.04060] Streamlined Deployment for Quantized Neural Networks
简评：

ysh329 commented 7 years ago

News

neon v2.1.0: Leveraging Intel® Advanced Vector Extensions 512 (Intel® AVX-512) | Intel Nervana
Why the PowerVR 2NX NNA is the future of neural net acceleration | Imagination Technologies
“NVIDIA Deep Learning Accelerator (NVDLA) - free and open architecture that promotes a standard way to design deep learning inference accelerators” | NVIDIA
NVIDIA DeepStream SDK | NVIDIA Developer NVIDIA DeepStream SDK, High performance deep learning inference for video analytics
如何评价百度刚刚开源的mobile-deep-learning | 知乎 [code]
MATLAB for Deep Learning: Design, build, and visualize convolutional neural networks | MATLAB & Simulink Run deployed models up to 4.5x faster than Caffe2 and up to 7x faster than TensorFlow
精确度达30cm，这款超精准GPS芯片将在2018年“空降”智能手机 | DeepTech深科技 [英文原文]

Paper

Code

Blog

ysh329 commented 7 years ago

News

Intel Gears Up For FPGA Push https://www.nextplatform.com/2017/10/02/intel-gears-fpga-push/

Code

peisuke/DeepLearningSpeedComparison: This repository is test code for comparison of several deep learning frameworks. https://github.com/peisuke/DeepLearningSpeedComparison https://www.slideshare.net/FujimotoKeisuke/deep-learning-framework-comparison-on-cpu

《Mixed Precision Training》P Micikevicius, S Narang, J Alben, G Diamos, E Elsen, D Garcia, B Ginsburg, M Houston, O Kuchaev, G Venkatesh, H Wu [Baidu Research & NVIDIA] (2017) Mixed-Precision Training of Deep Neural Networks | Parallel Forall https://devblogs.nvidia.com/parallelforall/mixed-precision-training-deep-neural-networks/ [1710.03740] Mixed Precision Training https://arxiv.org/abs/1710.03740

[1710.03740] Mixed Precision Training https://arxiv.org/abs/1710.03740

Efficient Methods and Hardware for Deep Learning | Stanford Digital Repository https://purl.stanford.edu/qf934gh3708

【从头开始搭建基于Core ML的IOS app】《Creating an IOS app with Core ML from scratch!》by Gerardo Lopez Falcón Creating an IOS app with Core ML from scratch! – Towards Data Science – Medium https://medium.com/towards-data-science/creating-an-ios-app-with-core-ml-from-scratch-b9e13e8af9cb

动态 | 共筑开放AI生态：ONNX标准得到华为、英特尔等更多厂商支持 https://mp.weixin.qq.com/s/kBDJ3lEj-JQDpNzvw6aV1Q

Investing in the future of retail with Standard Cognition https://medium.com/initialized-capital/investing-in-the-future-of-retail-with-standard-cognition-ffdd03fafd10

【自驾车辆AI训练：规模化挑战】 Training AI for Self-Driving Vehicles: the Challenge of Scale | Parallel Forall https://devblogs.nvidia.com/parallelforall/training-self-driving-vehicles-challenge-scale/

自动驾驶的未来全靠它！揭秘固态激光雷达创业江湖 https://mp.weixin.qq.com/s/dOvwoVZHnl2ElXF2Uu8mgw

【新型卷积实验】《Experiments with a new kind of convolution》by Sahil Singla Experiments with a new kind of convolution – Towards Data Science – Medium https://medium.com/towards-data-science/experiments-with-a-new-kind-of-convolution-dfe603262e4c [P] Experiments with a new kind of convolution：MachineLearning https://www.reddit.com/r/MachineLearning/comments/756xt2/p_experiments_with_a_new_kind_of_convolution/

PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM' by Salesforce GitHub: https://github.com/salesforce/pytorch-qrnn @schelotto: 简单说一下，LSTM训练速度很慢的原因是backprop时三个gate以及memory cell都依赖于上一个时间点的预测，因此是无法并行的。quasi RNN取消了gate的时间依赖，并采用Highway-net的残差链接有选择性的更新hidden layer，从而大大加快了训练速度。

【魔法背后：ARKit版数独解谜App是如何开发的(Keras + Vision Lib + CoreML + ARKit)】《Behind the Magic: How we built the ARKit Sudoku Solver》by Brad Dwyer Behind the Magic: How we built the ARKit Sudoku Solver https://blog.prototypr.io/behind-the-magic-how-we-built-the-arkit-sudoku-solver-e586e5b685b0

《Dilated Recurrent Neural Networks》S Chang, Y Zhang, W Han, M Yu, X Guo, W Tan, X Cui, M Witbrock, M Hasegawa-Johnson, T Huang [IBM & University of Illinois at Urbana-Champaign] (2017)

智能手机AI应用帮农户检测植物病害 Phone-Powered AI Spots Sick Plants With Remarkable Accuracy | WIRED https://www.wired.com/story/plant-ai

《Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling》C Ahuja, L Morency [CMU] (2017) [1710.02254] Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling https://arxiv.org/abs/1710.02254 https://github.com/chahuja/lru

《Dilated Recurrent Neural Networks》S Chang, Y Zhang, W Han, M Yu, X Guo, W Tan, X Cui, M Witbrock, M Hasegawa-Johnson, T Huang [IBM & University of Illinois at Urbana-Champaign] (2017) [1710.02224] Dilated Recurrent Neural Networks https://arxiv.org/abs/1710.02224

【GTC 2017上的Numba教程】’Numba tutorial for GTC 2017 conference'
GitHub: https://github.com/ContinuumIO/gtc2017-numba

《Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks》W Lai, J Huang, N Ahuja, M Yang [University of California, Merced & Virginia Tech & University of Illinois at Urbana-Champaign] (2017) [1710.01992] Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks https://arxiv.org/abs/1710.01992

动态 | 共筑开放AI生态：ONNX标准得到华为、英特尔等更多厂商支持 https://mp.weixin.qq.com/s/kBDJ3lEj-JQDpNzvw6aV1Q

《On the Effective Use of Pretraining for Natural Language Inference》I Cases, M Luong, C Potts [Stanford University] (2017) [1710.02076] On the Effective Use of Pretraining for Natural Language Inference https://arxiv.org/abs/1710.02076

《To prune, or not to prune: exploring the efficacy of pruning for model compression》M Zhu, S Gupta [Stanford University & Google] (2017) [1710.01878] To prune, or not to prune: exploring the efficacy of pruning for model compression https://arxiv.org/abs/1710.01878

【(C++)高性能深度学习库】“Deep Learning Library (DLL) 1.0 - Fast Neural Network Library” by Baptiste Wicht https://baptiste-wicht.com/posts/2017/10/deep-learning-library-10-fast-neural-network-library.html GitHub: https ://github .com/wichtounet/dll

[论文]《LSTM: A Search Space Odyssey》K Greff, R K Srivastava, J Koutník, B R. Steunebrink, J Schmidhuber (2015) LSTM: A Search Space Odyssey 通过8种LSTM变体在语音识别、手写体识别和复调音乐建模方面的大规模实验分析，对LSTM进行全面和深入刨析，很值得参考 [1503.04069v2] LSTM: A Search Space Odyssey https://arxiv.org/abs/1503.04069v2

【深度学习框架的未来】李沐：AWS开源端到端AI框架编译器NNVM https://mp.weixin.qq.com/s/qkvX0rmEe0yQ-BhCmWAXSQ Introducing NNVM Compiler: A New Open End-to-End Compiler for AI Frameworks | AWS AI Blog https://amazonaws-china.com/cn/blogs/ai/introducing-nnvm-compiler-a-new-open-end-to-end-compiler-for-ai-frameworks/

陈天奇：我们今天发布了基于TVM工具链的深度学习编译器 NNVM compiler。支持将包括mxnet，pytorch，caffe2, coreml等在内的深度学习模型编译部署到硬件上并提供多级别联合优化。速度更快，部署更加轻量级。支持包括树莓派，服务器和各种移动式设备和 cuda, opencl, metal, javascript以及其它各种后端。欢迎对于深度学习，编译原理，高性能计算，硬件加速有兴趣的同学一起加入dmlc推动领导开源项目社区。 NNVM Compiler: Open Compiler for AI Frameworks http://www.tvmlang.org/2017/10/06/nnvm-compiler-announcement.html

《Privacy-Preserving Deep Inference for Rich User Data on The Cloud》S A Osia... [Nokia Bell Labs & University of Oxford & Queen Mary University of London & Queen Mary University of London] (2017) [1710.01727] Privacy-Preserving Deep Inference for Rich User Data on The Cloud https://arxiv.org/abs/1710.01727 GitHub: https://github.com/aliosia/DeepPrivInf2017

[1710.00935] Interpretable Convolutional Neural Networks https://arxiv.org/abs/1710.00935

入门 | 一文概览视频目标分割 https://mp.weixin.qq.com/s/pGrzmq5aGoLb2uiJRYAXVw The Basics of Video Object Segmentation – techburst https://techburst.io/video-object-segmentation-the-basics-758e77321914

Small Deep Neural Networks - Their Advantages, and Their Design | bilibili https://www.bilibili.com/video/av15126749/ 来自：https://www.youtube.com/watch?v=AgpmDOsdTIA

【浏览器里的实时Performance RNN钢琴协奏Demo】《Real-time Performance RNN in the Browser | magenta》by Curtis Hawthorne https://magenta.tensorflow.org/performance-rnn-browser demo: https://deeplearnjs.org/demos/performance_rnn/index.html#2|2,0,1,0,1,1,0,1,0,1,0,1|1,1,1,1,1,1,1,1,1,1,1,1|1,1,1,1,1,1,1,1,1,1,1,1|0

【labelme：Python图像可视化标记工具】 ’labelme: Image Annotation Tool with Python' by labelme GitHub: https://github.com/wkentaro/labelme

【iOS平台CoreML/YOLO(v1)近实时目标检测】’Almost Real-time Object Detection using Apple's CoreML and YOLO v1' by Sri Raghu Malireddi GitHub: https://github.com/r4ghu/iOS-CoreML-Yolo ref:Computer Vision in iOS – Object Detection | Sri Raghu M https://sriraghu.com/2017/07/12/computer-vision-in-ios-object-detection/

foolwood/benchmark_results: visual tracker benchmark results https://github.com/foolwood/benchmark_results

mikesart/gpuvis: GPU Trace Visualizer https://github.com/mikesart/gpuvis

回顾 | 小鱼在家首席音频科学家邓滨：人工智能硬件设备中的语音前处理技术研究 https://mp.weixin.qq.com/s/H1jqzp_tkEaeJNP1TQe9dg

Microsoft/EdgeML: This repository provides code for machine learning algorithms for edge devices developed at Microsoft Research India. https://github.com/Microsoft/EdgeML

ysh329 / embedded-ai.bi-weekly

New increment #3

Merge

Company

Other list

News

blog

paper

News

Code