[20210905] Weekly AI ArXiv 만담

jungwoo-ha commented 3 years ago

News
ArXiv
- SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
- 이미지-텍스트 멀디모달 러닝 (from Google Brain, UW)
- 이미지 패치 + 텍스트 토큰 (prefix) --> output: Autoregressive seq2seq
- Conv (Res101/152의 stem빼고 첫 3블럭) + ViT 구조를 image encoder에 활용
- ALIGN에 사용했던 1.8B noisy dataset 활용, Text only: C4 800GB
- Downstream: VQA v2, SNLI-VE (entailment), NLVR2 (reasoning), COCO-caption, etc.
- SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts
- 15M개의 fashion images from 일본 Zozo (패션 e-commerce: IQON- 야후재팬) Research
- 훈련/테스트 데이터 분포 차이에 중점을 둔 데이터셋이지만 (2013 ~ 2020)
- 태스크는 좋아요, 가격합, 상품가격 regression, 카테고리 분류, set2set 매칭 등
- 초대량의 패션데이터, Multi object, 여러 분포상 변화, 다양한 메타데이터 (사용자 id, set id, 좋아요 숫자) 등으로 꽤 유용해 보일 듯
- https://github.com/st-tech/zozo-shift15m
- A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP
- Conv, Transformer, MLP를 동일한 framework 구성에서 비교 평가 (SPACH 제안) (from MSRA)
- Single stage / Multi stage (downsample), Mixing block을 각 컴포넌트로 구현
- Data aug는 DeiT 세팅을 따름.
- 결론은 Conv+Transformer 하이브리다가 젤 낫더라? (ImageNet-1k에서 일관성있는...)
- 파라미터크기, FLOP, throughput 관점에서 비교해보기 좋아 보임.
- ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding
- 음성인식 환경에서의 안정적인 NLU 벤치마크 (from Tencent AI) --> 스피커, 음성인식 분야의 중요한 이슈
- ASR error가 포함된 GLUE task
- https://drive.google.com/drive/folders/1slqI6pUiab470vCxQBZemQZN-a_ssv1Q
- Whole Brain Vessel Graphs: A Dataset and Benchmark for Graph Learning and Neuroscience (VesselGraph)
- 전체 3d 뇌 혈관 그래프 데이터 (그래프 러닝용) (from TUM, Helmhotz, U of Zurich)
- 뇌연구 혹은 graph representation learning 하시는 연구자분들께
- https://github.com/jocpae/VesselGraph
- Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems
- Task-oriented dialog pretraining 에 Noisy student 스타일의 self-training 적용 (from Huawei, EMNLP 2021)
- pseudo-labeling 기반의 semi-super + regularization by GradAug (data aug) --> label 정보를 반영하여 semantic 해치지 않는 aug 기법
- ToD 4가지 downstream task 에서 BERT, ToD-BERT를 활용해서 few-shot 세팅으로 성능 측정
- AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
- Transformer 기반의 Pretrained LM 관련 연구들을 다양한 기준으로 정리한 논문
- 입문하시는 분들 / 강의 하시는 분들께 유용할 듯
- https://mr-nlp.github.io/posts/2021/05/tptlms-list/

hollobit commented 3 years ago

“AI옵스(AIOps)가 대세”…IDC, 2023년 기업 75%, AIOps 채택 전망

https://www.itbiznews.com/news/articleView.html?idxno=48061

Facebook Apologizes After A.I. Puts ‘Primates’ Label on Video of Black Men

https://www.nytimes.com/2021/09/03/technology/facebook-ai-race-primates.html

Only Humans, Not AI Machines, Get a U.S. Patent, Judge Says

https://www.bloomberg.com/news/articles/2021-09-03/only-humans-not-ai-machines-can-get-a-u-s-patent-judge-rules Federal judge says AI can’t be listed as inventor on patents Case is first U.S. ruling in global dispute over AI inventions

The term AI overpromises. Let's make machine learning work better for humans instead

https://www.weforum.org/agenda/2021/09/ai-machine-learning-intelligence/

How open-source software shapes AI policy

https://www.brookings.edu/research/how-open-source-software-shapes-ai-policy/

OSS SPEEDS AI ADOPTION

OSS HELPS REDUCE AI BIAS

OSS AI TOOLS ADVANCE SCIENCE

OSS AI HELPS AND HINDERS TECHNOLOGY SECTOR COMPETITION

OSS CREATES DEFAULT AI STANDARDS

AI Weekly: An outline for government regulation of AI

https://venturebeat.com/2021/09/03/ai-weekly-an-outline-for-government-regulation-of-ai/ WHY AND HOW GOVERNMENTS SHOULD MONITOR AI DEVELOPMENT - https://arxiv.org/pdf/2108.12427.pdf

Sanas aims to convert one accent to another in real time for smoother customer service calls

https://techcrunch.com/2021/08/31/sanas-aims-to-convert-one-accent-to-another-in-real-time-for-smoother-customer-service-calls/

Understanding, explaining, and utilizing medical artificial intelligence (Nature Human Behaviour)

https://www.nature.com/articles/s41562-021-01146-0

AIMe – A standard for artificial intelligence in biomedicine

Maastricht University (UM)를 비롯한 여러 대학의 국제 연구에서 생물 의학 분야의 인공 지능(AI) 작업에 대한 표준화된 레지스트리를 제안

https://www.nature.com/articles/s41592-021-01241-0

https://aime-registry.org/

ghlee3401 commented 3 years ago

Paper

Neural HMMs are all you need (for high-quality attention-free TTS)
- Sample page : https://shivammehta007.github.io/Neural-HMM/
- Task : Text to Speech (TTS)
- Motivation
  - neural seq2seq TTS가 HMM을 이용한 parametric 방법보다 좋다고 알려져 있지만, 이론적으로 설명가능한 확률 모델도 아니고, attention을 위해서 학습 시간이 길어지는 등의 문제가 있음
  - HMM과 neural TTS의 장점을 결합하여 두 개의 장점을 모두 취하는 방법을 제안
  - Tacotron 2의 attention을 neural network를 이용한 hidden-Markov model로 대체
- Contribution
  - 적은 파라미터를 가지면서도 성능이 많이 떨어지지 않음
  - attention이 필요 없고 align을 빠르게 배움
  - HMM 기반이 neural TTS 수준의 음질을 보여준 것이 처음

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection
- Task : Audio segmentation and sound event detection
- Contribution
  - 기존 SOTA인 CRNN 보다 RNN을 사용하지 않아 빠르면서도 좋은 성능을 보임
  - segmentation이나 event detection 문제를 해결하기 위하여 기존 방법들이 각 프레임에 대하여 클래스를 예측하는 문제였지만, 시간 경계를 직접 regression으로 예측하기 때문에 후처리가 필요하지 않음
- Method
  - MobileNet architecture를 수정하였고 CNN으로 이루어짐
  - input으로는 mel-spectrogram 사용
  - output은 각 6 x #frames 음성 혹은 뮤직이 있는 구간인지 예측 (classification) 하고 시작시간과 끝 시간을 예측 (regression)
  - 샘플을 8s로 나누고, 다시 26개로 나누어 0.307로 나누어 label을 만듦
  - 예를 들어 music의 경우 시작 : 0.65 0.307 = 0.2 (s), 끝 : 130.307 + 0.975*0.307 = 4.29(s)

Kyung-Min commented 3 years ago

Paper
- Graph Attention Multi-Layer Perceptron
- Scalable GNN을 만들기 위한 시도
- 모델 크기를 키울 때 노드 피처를 계산하는 시간이 오래 걸림. 매 hop 마다 샘플링하고 receptive field size 커지고.. 그래서 노드 피처를 미리 다 뽑아놓고, mlp를 통해 feature transformation을 시킴
- 기본적인 구조는 SimpleGCN과 비슷하지만, 여러개의 hop에 걸쳐 얻은 피처들을 전부 활용하는 것에 차이가 있음
- Adjacent matrix의 K 거듭제곱을 통해 hop 별로 노드의 feature를 미리 다 뽑아 놓고, attention을 통해 feature 통합
- 그리고 MLP를 통해 transformation
- ogbn-products과 ogbn-papers100M에 테스트
- AI 가 발전하는 방향 from Foundation 모델
- emergence와 homogenization를 증가시키는 방향으로 발전
- 전문가시스템 (룰베이스) -> 기계학습 -> 딥러닝 -> 파운데이션 모델
- 전문가시스템은 emergence가 없음. 프로그래머가 작성한 룰 그대로 시스템 동작
- 기계학습을 사용하면 how가 생김. 머신러닝을 통해 입출력 사이 관계가 학습되더라 (예> f=ma)
- 딥러닝을 사용하면 features가 생김. CNN을 ImageNet에 학습시켰더니 이미지 feature를 뽑을 수 있게되더라
- 파운데이션 모델을 사용하면 functionalities가 생김. GPT-3를 학습하니 시도 쓰고, 코딩도 하고, 마케팅 문구도 쓰더라

jungwoo-ha / WeeklyArxivTalk

[20210905] Weekly AI ArXiv 만담 #23

Paper