[20210418] Weekly AI Arxiv 만담.

jungwoo-ha commented 3 years ago

AI News
- https://www.cnbc.com/2021/04/11/microsoft-in-advanced-talks-to-buy-speech-recognition-company-nuance.html?
- MS가 Nuance를??
- MS 헬스케어 산업 확장을 위한 교두보?
- https://digitalhumans.com/
- Digital human
- CG, Voice 등은 굉장히 수준 높아짐. 이제 DM과 brain으로 차별화 해야? (현재는 Blenderbot 이라고)
- 조금 써봤는데 캐릭터 합성, 음성합성은 서비스 수준까지 간듯.. display device + assistant 충분히 가능할듯
- https://www.youtube.com/watch?v=RiWB2o-9qMs (아인슈타인 버전)
- https://www.nvidia.com/ko-kr/gtc/ NVidia GTC.
- https://www.hankyung.com/it/article/202104121675v 카카오브레인 김일두 대표님 축하드립니다.
- https://news.naver.com/main/read.nhn?mode=LSD&mid=shm&sid1=101&oid=014&aid=0004621966 송창현 사장님 축하드립니다.
- https://news.naver.com/main/read.nhn?mode=LSD&mid=shm&sid1=105&oid=016&aid=0001822680 임혜숙 신임 과기부 장관님 축하드립니다.
Arxiv
- Adaptive Filters and Aggregator Fusion for Efficient Graph Convolutions (정이태님)
- adaptive filter 와 aggregator fusion 을 활용하여 parameter-efficient를 통해 좋은 성능ㅡmemory efficiency, lower latency , higher accuracy ㅡ을 보임.
- 청자분들 대다수가 현업에 계시는 분들이라 생각되어 architecture improvement 관점과 다르게 파라미터 관점에서 접근한다는 점이 색달라 혹 도움이 되지 않을까 싶어 가져와보았습니다 :)
- U of Cambridge + 삼성리서치.
- 특이한것은 요구 메모리가 edge수가 아닌 vertex 수에 비례하도록, 하드웨어까지 고려
- 실험에 사용된 그래프가 일반적 GNN논문에 사용되는 것들이 아니어서 데이터 설명이 좀더 있었으면 좋았을 것을..
- AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control (윤승제님)
- 딥 미믹 저자 jason peng 이 공동 1저자로 들어간 논문
- 시뮬레이션 내 캐릭터의 거동을 실제와 가깝게 재현하는 문제에 대해 data-driven method는 예전부터 제시가 많이 되어왔었습니다.
- 그러나 모션 데이터로부터 클립 어노테이션을 해야한다던가 추가적인 모션 플래너를 달아야 하는 등의 번거로움이 있었습니다.
- 논문에서는 캐릭터가 지정된 목표로 가는 태스크에 대한 reward와 함께, 모션 데이터셋 내의 움직임들의 스타일을 모방하기 위한 reward를 LSGAN을 사용하여 만들고, 해당 리워드 두개로 캐릭터를 학습시킵니다
- 결과는 홈페이지, 유툽에서 나온대로 복잡한 태스크 수행과 스타일의 모방을 동시에 만족시키고 있습니다
- https://xbpeng.github.io/projects/AMP/index.html
- SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network (김비호님) --> 하정우
- 구글 브레인 에서 발표한 Pre-trained big model - transfer learning ASR 실험 결과 논문 (작명......)
- 기존의 발표한 conformer - RNN-T 모델 트레이닝 스킴에서 추가로 공개된 스피치 DB를 모두 트레이닝에 포함
- 핵심은 다양한 데이터를 하나로 다합하고 (특별한 처리 없이) 큰모델에 학습하니 multi-domain 음성인식 성능 다 좋아지고 LM도 필요없게 되더라는... 전형적인 Big-scale recipe 가 음성인식에도 먹힌다라는...
- An Empirical Study of Training Self-Supervised Visual Transformers (이준형님)
- Kaming He, 말 그대로 SSL 기반 transformer에 대한 다양한 실험이네요... 어느분이 읽어보시고 설명좀 부탁드립니다 ㅎㅎ
- This paper does not describe a novel method. Instead, it studies a straightforward, incremental, yet must-know baseline given the recent progress in computer vision: selfsupervised learning for Visual Transformers (ViT). 앱스에 있는 이 한 문장으로 요약될 듯.
- Main은 ViT + MoCo v3. 각종 실험세팅에 따른 결과정리 (SimCLR, BYOL 도 있긴 함)
- Few-shot Image Generation via Cross-domain Correspondence
- I2I few-shot 캐리커처 변환
- https://github.com/utkarshojha/few-shot-gan-adaptation
- GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds
- Minecraft-to-3d image: 게임 RPG는 그냥 만들 각
- https://nvlabs.github.io/GANcraft/
- Reward Optimization for Neural Machine Translation with Learned Metrics
- BLUE이거 믿을만 한가?? 이미 ACL 2020 베스트 페이퍼에서 못쓸 물건이라고.
- 모델기반 BLUERT reward 최적화는 기법
- 역시 사람평가를 통해 분석해본 결과 BLUE는 이제 놔줘야 할...
- https://github.com/naver-ai/MetricMT
- Does Putting a Linguist in the Loop Improve NLU Data Collection?
- From NYU, 일단 제목은 어그로 확실하고..
- 3가지 비교: (i) a baseline protocol, (ii) a linguist-in-the-loop intervention with iteratively-updated constraints on the task, and (iii) an extension of linguist-in-the-loop that provides direct interaction between linguists and crowdworkers via a chatroom.
- 결론이.. 전문가가 데이터 구축시부터 함께해야 품질은 보장하면서 훨씬 더 어려운 평가데이터를 만들어 낼수 있다고합니다. 그러나 out-of-domain에 대해선 논외. 글고 챗플랫폼의 효과도 정량적 평가는 어려움.
- 그래도 구축시부터 함께하는 것이 확실히 좋은 데이터셋 만드는 데 도움주는듯.
- NT5?! Training T5 to Perform Numerical Reasoning
- from UC Berkeley and Google Research
- 요즘 이런 PLM기반의 수리적 추론 관련 연구들이 많이 나오는 듯
- Training Deep Capsule Networks with Residual Connections
- 오랜만에 보는 캡슐넷 논문
- 기존은 끽해야 캡슐블럭 1-2개. 이젠 residual connection을 달아 최대 15개까지
- 하지만 데이터들은.. MNIST, SVHN을 벗어나지 못함.
- 코드는 https://github.com/moejoe95/res-capsnet
- Unmasking the Mask -- Evaluating Social Biases in Masked Language Models
- MLM 로 pretraining 한 모델에서 social bias 평가하는 메트릭
- Mask를 기본으로 하다보니 몇가지 제약사항이 따름
- 그래서 Unmask 상태에서 likelihood를 계산 (AUL) 그리고 Attention score 가중치 주는 버전을 AULA 제안
- Privacy-Adaptive BERT for Natural Language Understanding
- from U Mass, Google
- d_x-privacy 기반의 local differential privacy 활용해서 각 디바이스에서 praviatized text or vector 만들고
- 그 privatized 된 정보를 서버에서 모아서 privatized 된 public data 와 함께 pretraining 하는 구조.
- federated learning 연계 가능
- SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements
- articulated surface elements --> 옷입은 3d 휴먼 변환
- https://qianlim.github.io/SCALE
- Retrieval Augmentation Reduces Hallucination in Conversation
- AI 대화모델이 구라치는 것을 retrieval기반의 augmentation 을 in the loop에 넣어서 해결해보겠다는 facebook 의 연구
- 요 방법은 주로 open-domain QA에서 써먹던 방법인데..
- Non-autoregressive sequence-to-sequence voice conversion
- Non-autoregressive s2s TTS를 voice conversion에 적용한 케이스.
- 속도는 아주 빠름. 들어보니 괜찮은 것 같기도.. 참고로 일알못
- https://kan-bayashi.github.io/NonARSeq2SeqVC/
- An Introduction of mini-AlphaStar
- DeepMind의 AlphaStar 의 소규모 훈련 환경 버전
- 코드는 https://github.com/liuruoze/mini-AlphaStar
- What Makes a Scientific Paper be Accepted for Publication?
- 이 또한 제목이 다소 어그로
- ICLR 논문을 대상으로..
- 결론은 a) the organising committee follows, for the most part, the recommendations of reviewers, and, b) the paper's main characteristics that led to reviewers recommending acceptance for publication are originality, clarity and substance.
- FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks
- USC와 Tencent
- 제목 그대로 federated learning for GNN 시스템과 벤치마크
- ICLR WS과 MLSys WS 억셉된.. GNN이 최근 거의 추천시스템의 대세임을 고려하면 체크해 볼만?
- https://github.com/FedML-AI/FedGraphNN
- Restoring and Mining the Records of the Joseon Dynasty via Neural Language Modeling and Machine Translation
- NAACL 2021
- LM과 번역기 통해 조선왕조실록 복원 및 분석
- 스캐터랩, 카이스트 주재걸교수님, 중앙대 연구진
- cf. 박진영 교수님(학생시절), 오혜연 교수님 의 주상전하 의사결정 예측하기

nick-jhlee commented 3 years ago

AISTATS 마무리되었습니당 (~~TMI~~)
- "The 24th International Conference on Artificial Intelligence and Statistics" (04/13 ~ 04/15)
- NeurIPS나 ICML처럼 time zone을 배려해주진 않았어요... (덕분에 전 일시적으로 낮밤 바뀌고, poster session 2개 정도 놓침...)
- 뭔가 작을줄 알았는데... 나름 비주류(?) 이론 학회치곤 커진 것 같네요.
- 한국에서 나온 submission을 거의 못봤어요.. ㅠㅠ
- 반면 많이 보였던 이름: 프랑스의 INRIA, 일본의 RIKEN, Google...
- Acceptance rate: 455/1527, Oral: 48개
- (AISTATS 2017때는 167/530...)

뭔가 적당히 이론적이면서 적당히 practical하면서 다들 관심있어 할만할 논문들 중 biased sampling 했어요 (저의 지극히 개인적인 research interest도 반영되었습니당)

Best Paper: Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
- 한줄 요약: EM algorithm의 비밀에 한 단계 더 다가갔다. (purely theoretical!)
Best Student Paper: Matérn Gaussian Processes on Graphs
- Traffic modeling을 할 때, Euclidean distance를 사용하는건 말이 안된다 ==> graph에서 직접 Gaussian process를 정의하자!
- Matérn kernel를 characterize하는 Stochastic PDE에서 Laplacian operator를 graph Laplacian으로 바꾼다!
- Graph interpolation of traffic, Citation network classification에서 좋은 performance를 보임 (이건 Matérn kernel이 가지는 flexibility가 한몫함)
Learning GPLVM with arbitrary kernels using the unscented transformation
- GPLVM에서 요하는 variational inference는 특정 kernel에서만 tractable (ex. Gaussian kernel)
- numerical integration: 너무 오래 걸림
- Monte Carlo: 좋은 optimization alg를 못씀 (ex. L-BFGS)
- Unscented transformation을 써서 아주 작은 evaluation 횟수로 같거나 더 좋은 결과를 얻음
- hyperparameter 없음, deterministic, 그리고 linearly scaling
Learning-to-Rank with Partitioned Preference: Fast Estimation for the Plackett-Luce Model
- "LTR (Learning-to-Rank)": training data의 partial order들을 토대로, 새로운 item들의 ranking을 예측하는 문제
- "Partitioned preference": group간의 ranking은 알지만, group내의 ranking을 모르는 상태
- Plackett-Luce (PL) model의 가정하에, 원래 complexity인 O(N + S!)를 O(N + S^3)까지 낮춤!
- 원래 알고리즘과 비슷한 performance를 보였고, 훨씬 더 scalable함도 보임.
Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
- Polyak step-size에서 inspire된, SGD의 learning rate schedule을 제시
- 추가되는 computational cost는 거의 없음
- (제가 생각해도) 상당히 약간 조건 하에서 convergence guarantee/rate를 제시
- "Finite optimal objective difference"
- 특히 interpolation setting (over-parametrized model)에선 항상 성립
- Deep matrix factorization, kernel binary classification, deep classification에서 모두 좋으면서 빠른 performance를 보임
- 일단 Adam 이김... AdamP랑 붙이면 어떻게 될까요? ㅎㅎㅎㅎㅎㅎㅎㅎㅎㅎㅎ 아님 AdamP와 합치면 다른 optimization algorithm이 나올수도...?
Competing AI: How does competition feedback affect machine learning?
- 서로 경쟁하는 AI 시스템들을 모델링 할 수 있는 model을 제시!
- Key idea: competition feedback loop를 형성함
- ex. Yelp vs Tripadvisor, Naver vs Daum(...?)
- 결론1: Competition이 있을 때, 특정 subpopulation에 대해서 specialization이 됨!
- 결론2: 사용자들의 quality of service가 AI 시스템의 개수에 대해 non-monotonic함!
- "optimal"한 AI 서 시스템의 개수가 존재!
Stable ResNet
- ResNet 살리려는 노오력
- Deep ResNet이 가지는 문제들 (unstable gradient, poor expressivity)를 scaling을 통해 해결
- 이론적으로, 그리고 실험적으로, Stable ResNet이 좋다는걸 보임
Dirichlet Pruning for Neural Network Compression
- Pre-trained model인 M을 먼저 밑의 방법을 이용해 M'으로 줄이고, 그 M'을 다시 training하는 방법
- 각 layer마다 "importance switch"를 추가하고, 이것의 distribution을 variational inference로 learning함!
  - prior는 Dirichlet distribution으로...! (<- conjugate prior입니당)
  - positive support를 가지고, 경우에 따라 sparsity가 나올 수도...!
  - 이런 importance switch의 probability vector는 각 channel의 importance를 모델링함!
- 잘 작동함: 3.5x smaller ResNet, 16x smaller VGGNet + visual interpretable!
Adaptive wavelet pooling for convolutional neural networks
- (안읽었는데, 저번에 어떤 분이 wavelet 이야기 하셔서 그냥 가져왔어용)
- 대애충 보니, 기존의 wavelet-based pooling을 좀 더 improve한거 같애요
- Wavelet-pooling이 왜 나왔나용? (사실 wavelet이 뭔지도 잘 모르겠어요,,)
Amortized Bayesian Prototype Meta-learning: A New Probabilistic Meta-learning Approach to Few-shot Image Classification
- (이것도 안읽었는데, 뭔가 관심이 있는 분들이 있을 수도 있어서 가져왔어용)
- 잘 몰라서 그러는데... probabilistic meta-learning이 뭔가요?

veritas9872 commented 3 years ago

GTC 2021 중 흥미로운 이슈. 혹시나 CUDA를 좋아하시는 분들을 위해 알려드립니다.

CUDA Python이 릴리스되어 이제 Low-level CUDA kernel을 파이썬으로 작성할 수 있습니다. https://developer.nvidia.com/cuda-python

NVIDIA NSight VSCode Edition 릴리스되어 Visual Studio를 사용하지 않고 우리가 모두 좋아하는 VSCode에서 NSight를 통해 CUDA kernel을 디버깅할 수 있습니다. NSight는 CUDA의 공식 프로파일러 중 하나입니다. https://developer.nvidia.com/nsight-visual-studio-edition-2020_3

veritas9872 commented 3 years ago

HuggingFace에서 Multi-GPU, FP16 등을 쉽게 사용할 수 있도록 Accelerate 라이브러리를 공개했습니다. PyTorch Lightning이 편하지만 training loop가 복잡해지면 PyTorch로 코드를 작성해야 하는데 그러면 Multi-GPU를 사용하기 위해 code refactoring을 할 필요가 있습니다. Accelerate를 사용하면 그런 부담이 적어질 것을 기대합니다. https://github.com/huggingface/accelerate

Facebook Research에서 PyTorch VIdeo를 공개했습니다. Video 관련 딥러닝을 하시는 분들께 도움이 될 것 같습니다. https://github.com/facebookresearch/pytorchvideo

veritas9872 commented 3 years ago

Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions

Google Health, Google Brain, DeepMind 3개 연구진에서 Medical Image에서 long-tailed distribution을 해소하고자 새로운 hierarchical loss를 제안했습니다. 지난번에 online shopping 등에서도 많이 발생하는 문제라고 말씀해주셨는데 관심 있으신 분들께 도움이 되길 바랍니다.

j-min commented 3 years ago

Robust OpenVocabulary Translation from Visual Text Representations

text embedding 으로 discrete vocab embedding 대신 이미지를 사용하는 연구입니다. 작년에도 비슷한 연구들을 (ex. https://arxiv.org/pdf/2010.10648.pdf) 본 거 같아요. 개인적으로 CLIP / DALL-E 의 text recognition ability 가 굉장히 신기했는데, 몇 년 뒤에는 OCR 모듈이 없이도 이미지 안의 텍스트를 자연스럽게 읽을 수 있지 않을까요?

jungwoo-ha / WeeklyArxivTalk

[20210418] Weekly AI Arxiv 만담. #6