[20210425] Weekly AI Arxiv 만담

jungwoo-ha commented 3 years ago

AI News
- EU AI regulation: https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-european-approach-artificial-intelligence
- AI 시스템 risk 4단계 정의
- 단계별 규제 범위 정의
- 굉장히 광범위하고 끼치는 영향이 많은데...
- EU는 강력한 자체 AI플랫폼이 거의 없는 상황이라 더 그럴 듯하고.. 우리가 복붙해서 적용하면...
- 관련 뉴스: http://www.newstown.co.kr/news/articleView.html?idxno=491695
- 당신이 100만 유로가 있을 때 이 규제 보고 AI 스타텁 투자하겠냐??
- https://www.mk.co.kr/news/society/view/2021/04/397667/
- 서울대 사회대의 AI교육: 서울대 사회과학대가 움직이는 걸 보면.. 심각하긴 심각한 모양..
- 2021년도 AI대학원 2곳 공식 선정: 서울대, 중앙대: 축하드립니다.
- ICLR Social 2021: ML in Korea
- 5월 4일, 9시~12시
- 김건희 교수님 오프닝, 서울대 조규진 교수님 plenary talk, ICLR 논문 20개 발표,
- 8개 회사 부스: NAVER, SKT, 현대자동차, LG AI 연구원, upstage, 가우스랩스, 카카오브레인, 하이퍼커넥트
- Gather town으로 더 강화된 social !!
- NAVER Search Colloquium 2021 ! (신재민님)
- http://naversearchconf.naver.com/
- https://blog.naver.com/naver_search/222312235870
Arxiv
- VideoGPT: Video Generation using VQ-VAE and Transformers
- 이름 그대로 VideoGPT from BAIR
- VQ-VAE + transformer decoder로 DALL-E와 유사한 구성
- 사용된 GPU가 Quadro RTX 6000 8장이란 측면에서 많은 분들이 해볼수 있을 듯
- 단 아직 퀄리티는 그닥... (당연히 GPU, 데이터 이슈가..)
- https://wilson1yan.github.io/videogpt/index.html
- VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
- video, audio waveform, text multimodal encoding transformer from Google
- CLIP 의 audio 까지 확장형
- Modality-specific or agnostic transformer + multimodal projection head + NCE loss + drop token
- 다운스트림으로 비디오 인식, 음성 이벤트인식, 이미지인식, Zero-shot text-to-retrieval 까지 적당한 성능
- Token Labeling: Training a 85.5% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet
- NU Singapore & Bytedance
- Bag of tricks 의 ViT 버전: 온갖 영끌 테크닉들의 집합체
- Token 기반의 cutmix 와 Re-labeling 도 포함 (글로벌리 널리 쓰이는 NAVER AI Lab 의 연구기술 !! )
- https://github.com/zihangJiang/TokenLabeling
- Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models
- Facebook, NYU, EACL 2021 (arxiv 첫 공개는 2019년 10월)
- Fine-tuning 하면 잃어버리는 것들. 어떻게 이 부분을 해결할 것인가?
- Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right?
- from UW, AI2
- Surface form: generation 모델에서 사실상 비슷한 의미인데 다른 후보가 되어 점수 확률 나눠먹기 하다보니 생기는 문제. 이건 추천에서도 비슷함. (아이템을 인덱스로 하다보니)
- 그래서 Domain Conditional Pointwise Mutual Information 이라는 새로운 스코어 제안.
- https://github.com/peterwestuw/surface-form-competition
- Metadata Normalization
- from Stanford, Fei-Fei Lee
- 새로운 Normalization, 효과는 관련없는 변수들(metadata)의 영향을 날려버리도록 (e.g. 성별 얼굴인식 하려는데 인종이 문제가 되는 )
- 메디컬 이미지 포함 여러가지 실험
- https://github.com/mlu355/MetadataNorm (아직은 404)
- Provable Limitations of Acquiring Meaning from Ungrounded Form: What will Future Language Models Understand?
- 현존 ungrounded system인 LM 이 정말 언어의 이미를 이해하는가?
- Assertion: contexts within raw text that provide indirect clues about underlying semantics.
- Ungrounded system이 semantic relation 을 제대로 이해하는 representation 학습에 필요한 조건은? assertion이 그걸 가능하게 하나?
- 아니라면 어디까지 되고 무엇은 안되는가?
- So-ViT: Mind Visual Tokens for Vision Transformer
- 최근 많이 나오는 ViT 개선 버전 중 하나
- ImageNet-1k 로만 학습하는데 token embedding시에 conv 쓰고 (stem 1x1, stage:1x1, 3x3 st2, 1x1), head에서 cross-covariance pool
- 정확도나 효율 기준으로는 PiT보단 조금 덜한 느낌적 느낌?
- https://github.com/jiangtaoxie/So-ViT
- ImageNet-21K Pretraining for the Masses
- Alibaba
- ImageNet-21k pretraining을 위한 다양한 recipe 분석.
- 다양한 백본 아키텍처 실험
- https://github.com/Alibaba-MIIL/ImageNet21K
- On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation
- from CMU, Adobe (Jun-Yan Zhu, Richard Zhang)
- GAN 성능 지표로 FID 많이들 쓰시는데... open image 처리 라이브러리 특히 resizing 조건에 매우 민감
- 측정된 FID가 전혀 엉뚱하게 갈 수 있음을 다양한 실험을 통해 증명
- 그래서 피해야 할 것과 추천하는 항목들 소개
- https://www.cs.cmu.edu/~clean-fid/
- Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
- from Vector Institute
- Initial parameter 와 학습된 model 사이에서 interpolation하면 보통은 loss가 monotonic decreasing 함
- 이것과 관련해서 언제 이런 룰이 깨지는 지등 포함 다양한 분석
- NN 구조와 이론(?)에 관심 있는 분들은 읽어보시길..
- Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
- CVPR 2021
- 헤드포즈 컨트롤 가능한 말하는 얼굴 클립 합성
- Audio-visual multimodal fusion 인데 identity 와 frame은 identity-agnostic pose space로 별도 분할 하고 speech 쪽과 연결
- 자세한 그림은 모델 구조도 참조.
- https://github.com/Hangz-nju-cuhk/Talking-Face_PC-AVS
- MetricOpt: Learning to Optimize Black-Box Evaluation Metrics
- 모처럼 애플에서 나온 연구. 나오면 CVPR oral은 기본?
- misclassification rate 이나 recall 같은 non-differentiable metric을 SGD나 adam 같은 기존 optimizer 활용하는 blackbox optimization 기법
- Meta-learning과 RL의 value function approx analogy.
- 그래서 파인튜닝할때 매우 유용하다고 합니다.

veritas9872 commented 3 years ago

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs https://openreview.net/forum?id=vYeQQ29Tbvx 나온지 한 달이 지났지만 지난번에 random projection의 영향과도 관계 있어 올려드립니다. MIT와 Facebook AI에서 나온 ICLR 2021 Paper입니다.

Neural Network 학습에서 다른 모든 parameter를 random intialization 후 fix하고 BatchNorm의 weight와 bias만을 학습하더라도 CIFAR10에서 82% accuracy와 ImageNet에서 32% top-5 accuracy를 얻을 수 있다는 것을 실험적으로 보여준 논문입니다. 기존의 SVM에서 random feature를 사용하는 논문이 NeurIPS 2017년 test of time award를 받았는데 비슷한 원리로 neural network에서도 random projection을 사용하는 것이 생각보다 효과적일 수 있을 것 같습니다.

Sparse Attention with Linear Units https://arxiv.org/abs/2104.07012v1

Transformer에서 attention을 sparse하게 만드는 것이 학습을 보다 원활하게 할 수 있는데 별도의 sparsifying transform 대신 SoftMax를 ReLU로 대체하는 것만으로도 성능 향상을 가져올 수 있다고 주장하는 논문입니다. 실제로 그런지는 확인이 필요할 것 같습니다.

veritas9872 commented 3 years ago

Facebook FlashLight https://ai.facebook.com/blog/flashlight-fast-and-flexible-machine-learning-in-c-plus-plus/

페이스북에서 Flashlight이라는 새로운 C++ 전용 딥러닝 라이브러리를 만들었습니다. PyTorch와 유사한 API를 가지고 있어서 별도의 라이브러리를 만든 이유가 궁금하지만 Python보다 C++을 선호하시는 분들께서 관심이 있을 것 같아 공유드립니다.

Stanford 224W: Machine Learning with Graphs https://youtu.be/JAB_plj2rbA

스탠포드 대학에서 Graph Neural Network 관련 수업 자료를 YouTube에 공개했습니다. Graph Neural Network가 처음 접하신 분들께 쉽지 않을텐데 이 수업은 CS231n만큼 유명해지지 않을까 기대됩니다.

nick-jhlee commented 3 years ago

점점 더 큰 모델이 만들어지고 쓰이는 걸 생각하면, 상당히 중요한 문제인듯 해서 가져와보았습니다....!

Carbon Emissions and Large Network Training Yann Lecun 아저씨가 페북에서 홍보한걸 보고 가져와보았습니다. 근데 하필 구글 paper...

Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters.
Geographic location matters for ML workload scheduling since the fraction of carbon-free energy and resulting CO2e vary ~5X-10X, even within the same country and the same organization. We are now optimizing where and when large models are trained.
Specific datacenter infrastructure matters, as Cloud datacenters can be ~1.4-2X more energy efficient than typical datacenters, and the ML-oriented accelerators inside them can be ~2-5X more effective than off-the-shelf systems.
Remarkably, the choice of DNN, datacenter, and processor can reduce the carbon footprint up to ~100-1000X

(페북 댓글 중 일부) "If I understand correctly, training a single instance of GPT3 uses ~1300MWh, produces ~500tCO2e . We are speaking about 500 round trip from Paris to New York in term of CO2. We are speaking about the electrical consumption of 60 average french households. Jaw dropping, even though I was already aware of the high "costs" of such big models."

cf. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Google AI Ethics 팀을 날려버린(?) 페이퍼, 여기에도 environmental impact가 언급되어 있어요

jungwoo-ha commented 3 years ago

@veritas9872 이것이 공개된 Timnit Gebru 논문이군요 ㅎㅎ 5월 3일 ICLR invited talk 기대 됩니닷!

nick-jhlee commented 3 years ago

<2021 Naver Search Colloquium> http://naversearchconf.naver.com/ - Learning to Rank - User Modeling, Fairness - eCommerce - Platform - Vision - Language AI 기대돼요...!

jshin49 commented 3 years ago

NAACL 2021 Accepted Paper list 가 공개되어서 몇 가지 가져와봅니다.

jungwoo-ha / WeeklyArxivTalk

[20210425] Weekly AI Arxiv 만담 #7

점점 더 큰 모델이 만들어지고 쓰이는 걸 생각하면, 상당히 중요한 문제인듯 해서 가져와보았습니다....!