[20210718] Weekly AI ArXiv 만담

jungwoo-ha / WeeklyArxivTalk

[Zoom & Facebook Live] Weekly AI Arxiv 시즌2

973 stars 41 forks source link

[20210718] Weekly AI ArXiv 만담 #17

Closed jungwoo-ha closed 3 years ago

jungwoo-ha commented 3 years ago

AI News
- EMNLP 2021 - Rebuttal 종료: 수고 많으셨습니다. 모두들 Good luck!
- NeurIPS 2021 - Review 종료
- NVidia Jetson developer meetup (21. 7. 22)
- Google ML 부트캠프 모집 시작 (~ 8.2): https://events.withgoogle.com/google-developers-mlb-kr-2021/
- OpenAI disbands its robotics research team
- Max Welling Qualcomm AI Research 떠나다 (https://www.linkedin.com/posts/max-welling-4a783910_it-is-with-gratitude-that-i-announce-my-resignation-activity-6821705740051333120-oMHh)
AI ArXiv
- Blender Bot 2.0
- 기존 GPT3 나 Blender Bot 1.0의 다양한 문제 해결: 기억의 한계 (token window 제한), hallucination, world knowledge 등
- Internet search + long-term memory
- ParlAI platform 기반
- Industry and Academic Research in Computer Vision
- 2010 ~ 2019 10년간 CVPR, ICCV, ECCV, ACCV, BMVC 논문 모두 조사
- AI 연구의 무게 중심이 학계 --> 산업계로 점점이동 (Stanford AI report에도 비슷한 경향)
- Top100 페이퍼는 산업계 비중이 더 큰 경우가 많음. (데이터, 인프라, 사람 모두)
- 2020년 이후는 이런 경향이 더욱 심화
- 결론: 산업계와 학계가 더욱 강력한 콜라보레이션을 해야합니다. (e.g. 네이버의 서울대, 카이스트 AI 연구센터)
- ViTGAN: Training GANs with Vision Transformers
- Pure Transformer (ViT) 기반의 GAN 모델
- HiT가 MHSA 구조를 수정을 한 것인 반면 ViTGAN은 기본적인 MHSA를 그대로 활용하되 Layer Norm을 수정하고 decoding에서 Fourier embedding 기반의 implicit neural representation 기법을 활용한 것으로 보임
- Latent z는 styleGAN처럼 mapping network으로 w로 변경하고 w를 LN에 활용하는 SLN 제안
- 아쉽다면 생성된 이미지 해상도가 좀 작은 것이...
- ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
- 기존 autoregressive 모델이 NLU finetuing에 다소 효과가 떨어짐
- 대량의 plain text + knowledge graph도 함께
- 10B 파라미터에 4T 규모의 텍스트 데이터 + KG를 학습
- 수렴속도를 위해 progressive learning 도 제안
- FLEX: Unifying Evaluation for Few-Shot NLP
- Few-shot NLP task를 위한 벤치마크, 리더보드, 성능평가 방법까지 (from AI2)
- GPT3포함 초대규모 언어모델 활용위해 in context few-shot이나 prompt optimization 기반의 few-shot 성능 평가가 많지만 사실 hyperparamter 설정을 대규모 데이터로 하다보니 실제 few-shot과는 거리가 있음.
- https://github.com/allenai/flex
- Randomized ReLU Activation for Uncertainty Estimation of Deep Neural Networks
- Uncertainty를 Randomized ReLU로 향상 시키려는 시도
- RReLU: -부분을 랜덤하게 coefficient
- DropReLU: 랜덤하게 linear 였다가 ReLU 였다가
- MC-dropout과 비교 acc는 모르겠지만 ECE는 개선된다고..
- Spanish Language Models
- 스페인어 RoBERTa-base 와 large 모델
- 스페인어권 혹은 스페인어사용 국가 (남미 등) 진출을 원하시는 AI기업은 활용하시면 좋겠네요.
- 허깅페이스: https://huggingface.co/BSC-TeMU/roberta-base-bne
- AFHQ-v2 dataset: https://github.com/clovaai/stargan-v2

Kyung-Min commented 3 years ago

Papers

Transfer-Meta Framework for Cross-domain Recommendation to Cold-Start Users
- Cross-domain Recommendation을 하는 기존 방식은 source/target domain의 overlapping users에 대해 두 도메인들에서 계산된 임베딩을 가까이해주는 함수 f를 학습하는 것이었음 f(emb(user_s)) <-> emb(user_t)
- 이 경우 함수 f는 overlapping users에 대해서 biased 되기 쉽다는 문제가 생김
- 그래서 transfer-meta framework를 제안
- transfer stage에서는 overlapping user 외에 모든 source/target domain users를 pre-train하자는 내용
- meta stage에서는 함수 f를 계산할 때 gradient를 overlapping user의 source/target domain에서 embedding 오차로부터 계산하지 않고, 실제 target domain의 cold user들에 대해 발생하는 downstream task loss로부터 계산하자는 내용 (MAML style)
Tabular Data: Deep Learning is Not All You Need
- 정글의 세계와 같은 real-world structured data에 대해 딥러닝 모델들이 얼마나 잘하고 있는지 평가 (물론 이보다 더 한 야생의 세계는 많이 있긴하지만)
- Tabular data를 다루기 위해 여러가지 deep learning methods가 제안되었지만, 논문에 사용된 것 외 다른 데이터셋에 테스트해보면 베이스라인 (XGBoost) 보다 못하다.
- 실험에서는 캐글에 공개된 11개 데이터셋에 대해 4개 딥모델과 XGBoost 성능을 비교
- 물론 모든 모델에 대해 HPO를 수행
- XGBoost와 deep models를 ensemble하면 best 성능을 낼 수 있었다.
- mse 성능비교표
Learning to Embed Categorical Features without Embedding Tables for Recommendation
- 추천시스템에서 user/item id feature를 쓰면 크기가 매우 큰 임베딩 look-up 테이블을 만들어야함 -> 매우 큰 메모리 필요
- 이를 해결하기 위해 여러개 (∼1000)의 hash function을 사용하여 initial id를 임베딩해주고, learnable한 Neural Net layer로 여러번 transformation해주는 Deep Hash Embedding 제안
- Wide & shallow emb (기존 one-hot encoding) <-> deep emb (proposed)
- Generalization ability를 높이기 위해서 side information도 활용
What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation
- Pretrained language model (PLM)을 추천모델에 접목시키고자 했던 첫번째 논문
- BERT를 Wiki, BookCorpus 등으로 학습시키고 finetuning하지 않았을 때, 기본적으로 책,영화, 그리고 음악에 대해서 얼마나 잘 알고 있는지 테스트 (probing)
- Content-based와 collaborative-based 두 가지 측면에서 PLM을 테스트했을 때 content-based type 문제를 더 잘 풀 수 있었음
- 더 나아가서 PLM을 사용해서 conversational recommender system을 다운스트림 태스크로 풀었을 때 성능을 올릴 수 있는 방법으로 multi-task learning 제안
- News
DeepMind에서 protein의3D structure를 에측할 수 있는 AlphaFold v2 코드 공개

ghlee3401 commented 3 years ago

Paper

DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement
- Accepted by Interspeech 2021 DNS(Deep Noise Suppression), MOS 3.57 기록.
- Goal : 긴 길이를 가지는 speech sequence에 대한 speech enhancement
- Method :
  1. dual-path RNN (DPRNN)과 Convolution Recurrent Network (CRN)을 결합하여 DPRNN (Dual-path Convolution Recurrent Network) 모델을 설계
  2. 긴 시퀀스를 작은 chunk로 split하고 intra-chunk & inter-chunk에 RNN을 적용
Dance2Music: Automatic Dance-driven Music Generation
- 데모 :
  - https://sites.google.com/view/dance2music/live-demo
  - https://sites.google.com/view/dance2music
- Goal : 춤이 주어졌을 때 자동으로 음악을 생성
- Problem : 기존에는 음악이 주어졌을 때 춤을 생성하는 방법에 대해서 논의가 되었지만 그 반대는 연구가 되지 않았음
- Method :
  1. Offline approach 와 Online approach로 나뉨
  2. Offline : 댄스 비디오와 음악이 존재하는 상황
  3. Online : 댄스 비디오를 neural net의 input으로 주고 음악을 생성하는 경우 (5개의 note 중에 하나의 note를 예측)

hollobit commented 3 years ago

DeepMind’s AI for protein structure is coming to the masses (Nature)

https://doi.org/10.1038/d41586-021-01968-y
같은 날짜에 nature와 sicence에 동시에 논문 발표, 동시에 오픈소스 공개
7/1에 오픈한 RoseTTAFold는 140개 이상의 독립적인 연구 팀이 GitHub에서 이 프로그램을 다운로드

Highly accurate protein structure prediction with AlphaFold (Nature)

https://www.nature.com/articles/s41586-021-03819-2

https://github.com/deepmind/alphafold (7/15)

Accurate prediction of protein structures and interactions using a three-track neural network (Science)

https://science.sciencemag.org/content/early/2021/07/14/science.abj8754

http://scimonitors.com/%EC%83%88%EB%A1%9C%EC%9A%B4-ai-%EB%8B%A8%EB%B0%B1%EC%A7%88-%EA%B5%AC%EC%A1%B0-%EC%98%88%EC%B8%A1-10%EB%B6%84-%EB%A7%8C%EC%97%90-%EA%B3%84%EC%82%B0/

https://github.com/RosettaCommons/RoseTTAFold (7/1)

Evaluating Large Language Models Trained on Code

https://arxiv.org/abs/2107.03374
- OpenAI에서 소프트웨어 소스 코드를 생성하는 딥 러닝 모델인 Codex에 대한 세부 정보를 공개
- “no free lunch” theorem (NFL) - 일반화와 성능의 상쇄효과
- 평가 문제에 대해 3억개 파라미 Codex는 13.2 % 해결, 120억개 파라미터 모델은 28.8% 해결
- 그러나 GPT-3의 정식 버전은 1,750억 개의 매개변수인데도

IS GPT-3 OVERHYPED?

https://www.bbntimes.com/science/is-gpt-3-overhyped

Austin commits to $1.5B for DOD’s Joint AI Center over next 5 years

https://www.fedscoop.com/lloyd-austin-dod-jaic-funding/
- more than 600 AI projects

중동 지역은 차세대 AI 허브가 될 수 있을 것인가

http://www.aitimes.com/news/articleView.html?idxno=139663

인공지능 시대 사기의 아이콘 된 AI 투자

https://www.mk.co.kr/opinion/columnists/view/2021/07/669930/

중국 인공지능 특허량 세계 1위

https://news.joins.com/article/24107239
인공지능 관련 특허 출원 상위 3개국은 중국, 미국, 일본. 특히 중국 상하이의 인공지능 관련 특허 출원 건수는 4만 2000건을 넘어섰으며 그 중 유효 특허는 9400건

veritas9872 commented 3 years ago

What Classifiers Don't Know What They Don't? https://arxiv.org/abs/2107.06217 https://github.com/facebookresearch/uimnet

Facebook에서 Model Uncertainty 및 Calibration에 대해 다양한 방법을 비교한 실험 논문입니다. 아직 GitHub는 공란이고 대표할만한 그림이 없는 것이 단점입니다.

ImageNot Dataset이라는 ImageNet을 ResNet18의 feature를 기준으로 OOD로 나눈 데이터셋을 구별하고 in-distribution과 out-of-distribution data에 대해 최근 제시된 다양한 방법들을 비교합니다. 결론은 하나의 모델에서 multi-input multi-output 방식의 TTA를 진행하거나 여러 calibrated된 model을 ensemble (bagging)하는 방법을 사용하는 것을 권장하는 내용입니다.

Passive attention in artificial neural networks predicts human visual selectivity https://arxiv.org/abs/2107.07013

Deep Mind에서 CAM 등 soft-attention 방법들과 실제 인간의 시각 등 확인하는 영역이 일치하는지 실험적으로 확인하는 심리학과와의 협업 논문입니다. Deep Learning Theory 보다는 인간의 attention과 Deep Learning의 attention이 일치하는지 확인하는 논문입니다.

Per-Pixel Classification is Not All You Need for Semantic Segmentation https://arxiv.org/abs/2107.06278

Facebook 에서 pixel-level classification이 아닌 mask prediction을 기존 architecture에 바로 사용할 수 있도록 구현한 새로운 방법론입니다. Segmentation 을 하시는 분들께 도우이 될 것 같아 공유드립니다. Facebook에 의하면 현재 ADE20K에서 SOTA라고 합니다.

Revisiting the Calibration of Modern Neural Networks https://arxiv.org/abs/2106.07998 https://github.com/google-research/robustness_metrics/tree/master/robustness_metrics/projects/revisiting_calibration Google에서 MLP-Mixer, ViT 등 최신 architecture에서는 CNN과는 달리 miscalibration issue가 발생하지 않는다는 실험 논문입니다. Miscalibration에 대해 문제가 많은 경우 참조할 수 있는 논문으로 생각됩니다.

Fully Sharded Data Parallel: faster AI training with fewer GPUs https://engineering.fb.com/2021/07/15/open-source/fsdp/ Facebook Engineering에서 Multi-GPU를 latency issue 없이 model 변형 없이 학습을 원활히 진행할 수 있는 FSDP라이브러리를 공개했습니다. PyTorch Lightning에서 1.4부터 사용할 수 있는 것으로 알고 있습니다. DeepSpeed와 유사하지만 PyTorch와 호환이 더 좋을 것으로 생각됩니다.

jwlee-ml commented 3 years ago

Papers

Transfer-Meta Framework for Cross-domain Recommendation to Cold-Start Users

Cross-domain Recommendation을 하는 기존 방식은 source/target domain의 overlapping users에 대해 두 도메인들에서 계산된 임베딩을 가까이해주는 함수 f를 학습하는 것이었음 f(emb(user_s)) <-> emb(user_t)

이 경우 함수 f는 overlapping users에 대해서 biased 되기 쉽다는 문제가 생김

그래서 transfer-meta framework를 제안

transfer stage에서는 overlapping user 외에 모든 source/target domain users를 pre-train하자는 내용

meta stage에서는 함수 f를 계산할 때 gradient를 overlapping user의 source/target domain에서 embedding 오차로부터 계산하지 않고, 실제 target domain의 cold user들에 대해 발생하는 downstream task loss로부터 계산하자는 내용 (MAML style)

Tabular Data: Deep Learning is Not All You Need

정글의 세계와 같은 real-world structured data에 대해 딥러닝 모델들이 얼마나 잘하고 있는지 평가 (물론 이보다 더 한 야생의 세계는 많이 있긴하지만)

Tabular data를 다루기 위해 여러가지 deep learning methods가 제안되었지만, 논문에 사용된 것 외 다른 데이터셋에 테스트해보면 베이스라인 (XGBoost) 보다 못하다.

실험에서는 캐글에 공개된 11개 데이터셋에 대해 4개 딥모델과 XGBoost 성능을 비교

물론 모든 모델에 대해 HPO를 수행

XGBoost와 deep models를 ensemble하면 best 성능을 낼 수 있었다.

mse 성능비교표

Learning to Embed Categorical Features without Embedding Tables for Recommendation

추천시스템에서 user/item id feature를 쓰면 크기가 매우 큰 임베딩 look-up 테이블을 만들어야함 -> 매우 큰 메모리 필요

이를 해결하기 위해 여러개 (∼1000)의 hash function을 사용하여 initial id를 임베딩해주고, learnable한 Neural Net layer로 여러번 transformation해주는 Deep Hash Embedding 제안

Wide & shallow emb (기존 one-hot encoding) <-> deep emb (proposed)

Generalization ability를 높이기 위해서 side information도 활용

What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

Pretrained language model (PLM)을 추천모델에 접목시키고자 했던 첫번째 논문

BERT를 Wiki, BookCorpus 등으로 학습시키고 finetuning하지 않았을 때, 기본적으로 책,영화, 그리고 음악에 대해서 얼마나 잘 알고 있는지 테스트 (probing)

Content-based와 collaborative-based 두 가지 측면에서 PLM을 테스트했을 때 content-based type 문제를 더 잘 풀 수 있었음

더 나아가서 PLM을 사용해서 conversational recommender system을 다운스트림 태스크로 풀었을 때 성능을 올릴 수 있는 방법으로 multi-task learning 제안

News

DeepMind에서 protein의3D structure를 에측할 수 있는 AlphaFold v2 코드 공개

Regularization is all you Need: Simple Neural Nets can Excel on Tabular

좀 다른 의견의 논문인데 요것도 흥미있어 보입니다.

jungwoo-ha commented 3 years ago

AutoGluon: https://auto.gluon.ai/stable/index.html

veritas9872 commented 3 years ago

AutoGluon Tabular 논문: https://arxiv.org/abs/2003.06505

nick-jhlee commented 3 years ago

The Benchmark Lottery

Google Brain, Google Research, DeepMind
"The benchmark lottery postulates that many factors, other than fundamental algorithmic superiority, may lead to a method being perceived as superior"
benchmark로 성능 평가하는것의 한계점이 많다,,,
Proposed benchmark checklist for reviewers/ACs:

Clyde21c commented 3 years ago

Conservative Objective Models for Effective Offline Model-Based Optimization ICML 21 2021_07_18_21_54_01_751

https://sites.google.com/berkeley.edu/coms
offline 데이터셋을 활용해 regression 모델을 학습하여 optimization 문제를 풀때, 데이터 분포 밖에서 regression모델이 높은 값을 가지는걸 방지하고자 adversarial data에 대한 페널티를 loss에 추가
offline Reinforcement Learning 방법(CQL; https://arxiv.org/pdf/2006.04779.pdf)의 응용
단백질, DNA sequence 최적화 문제에서 성능 검증 -> 두번째 좋은 방법보다 16% 높은 결과 (전체적으로 뚜렷하게 좋다는 느낌은 아니지만..)

veritas9872 commented 3 years ago

LabML AI 링크 공유해드립니다. https://papers.labml.ai/papers/daily Arxiv Sanity와 유사하지만 웹사이트가 훨씬 안정적입니다.