[20210829] Weekly AI ArXiv 만담

News
- 구글헬스 다시 분할: https://news.naver.com/main/read.naver?mode=LSD&mid=sec&sid1=105&oid=023&aid=0003635642
- EMNLP 2021 notification
- Interspeech 2021 시작! : 8.30 ~ 9.3 (Hybrid!!!)
- SKT ai.x : https://www.skt.ai/kr/ai_x/index.do
- 서울대 이경무 교수님 IEEE T PAMI EiC 선임: https://www.mk.co.kr/news/society/view/2021/08/816207/
- 2021 AI페스티벌 by 대덕넷: https://aifesta.co.kr/ (9.2 ~ 4) - 개더타운 (2,3일은 sold out)
Arxiv
- On the opportunity and risks of foundation models
- Stanford Univ. HAI에서 Center for Research on Foundation Models를 세우고 이를 집대성해서 정리한 보고서
- 무려 160페이지 (Ref합하면 212페이지). Intro만 읽어도 유용할 듯
- Foundation model이라고 명명한 이유가 전체적 내용을 잘 반영함
- 주요 application을 Foundation model 이 나아가야 할 방향관점에서 선정: Healthcare, Education, Legal
- 전체적으로 지금까지 주로 논의됐던 내용들이 많음. 그런데 어째 상당히 학교중심의 시각이 과하게 강해 보이기도 하고..
- Weakly Supervised Continual Learning
- CL은 practical setup을 찾기 위한 연구가 많이 진행되는 듯
- Continual learning 의 새로운 셋업: 소수의 label이미지와 다수의 unlabeled 이미지가 stream으로 들어오는 구성
- Continual Interpolation Consistency (CIC) 와 Contrastive CIC (CCIC) 방법제안
- CIC 는 unsupervised sample 들에 대해 data augmentation과 pseudo labeling통한 consistency regularization 기반. Task-free 적용 가능 (여기선 실험 안함)
- CCIC는 contrastive learning을 위해 task id를 활용
- Learning From Long-Tailed Data With Noisy Labels
- 점점더 실세계 문제에 가까운 세팅으로: Long-tail + Noisy label (from NAVER LABS Europe)
- 둘은 서로 반대되는 성향임. long-tail은 소수 클래스를 고려해줘야 하는 데 noisy의 경우 class 정보를 믿을 수가 없어서 small loss trick도 못쓰고 uncertainty를 대하는 시각이 다름.
- Self-supervised pretraining (SimCLR, BYOL, SimSiam, Barlow twins) --> fine-tuning with two loss (logit adjust: imbalance, superloss: curriculum)
- Rethinking Why Intermediate-Task Fine-Tuning Works
- 최종 fine-tuning 전에 중간 fine-tuning이 재미를 보는 경우가 있는데 어떨때 재미를 보는 지 분석 (EMNLP2021 Findings)
- 과거 논문: ACL2020 에서 효과가 있다는 (특히 RoBERTa 에서) 것만 얘기를 했고 아마도 HellaSwag 나 CosmosQA 같은 복잡한 추론이 필요한 것들이 잘되는 것 같다고.
- 여기서는 simple하게 GPT2로 만든 real-fake 분류도 베이스라인으로 해보고 HellaSwag에서 context 제거한 HellaSwag-p를 써도 도움이 되는 듯
- A Survey on Automated Fact-Checking
- Fact-checking은 신뢰가능성 관련 아주 중요한 분야
- 관련연구 하시는 분들을 위해 쵝근 5년간 나온 데이터와 연구를 정리한 연구 (from Univ. of Cambridge)
- Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
- T5를 sentence embedding용으로 활용 (from Google Research)
- 은근히 sentence vector가 필요한 경우가 많은데 T5는 s2s에서 빼서 쓰기가 애매했음.
- 기존 Sentence BERT다 SimCSE-RoBERTa 보다 STS등에서 더 좋은 성능을 보인다고.
- LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
- 이미지 + 캡션 + 마우스 트레이싱 정보를 활용한 self-supervised learning with contrastive + localization loss (ICCV2021)
- Contrastive: image-text / Localization: text attention map과 rendered attention map 기준 마우스 temporal crop 정보
- Localized Narrative (ECCV2020, 800k 정도 데이터)로 pretraining:
- 훨씬 적은양의 pretraining 데이터로도 효과적으로 학습가능

Paper

Self-Attention for Audio Super-Resolution

Task : Audio super-resolution
Method : Attention-based Feature-Wise LInear Modulation (AFiLM)
Abstract
1. Temporal film: Capturing long-range sequence dependencies with feature-wise modulations (TFiLM), 2019 NeurIPS 논문에서 RNN 대신 self-attention module을 이용하여 long-range dependencies를 학습
2. TFiLM 보다 더 빠르게 학습이 가능하면서도 성능이 더 좋음
같이 보면 좋은 논문
1. FiLM: Visual Reasoning with a General Conditioning Layer, AAAI 2018
2. Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulation, NeurIPS2019

Scorpiano -- A System for Automatic Music Transcription for Monophonic Piano Music

Task : Automatic Music Transcription (AMT) of monophonic music
Method : onset detection, tempo estimation, beat detection, pitch detection, music score generator
Abstract
1. neural network 기반의 AMT는 더 많은 processing power가 필요하고 training을 위한 큰 데이터셋이 필요
2. Scorpiano는 digital signal processing approach를 기반으로 audio로부터 score를 만들어 냄
3. 논문에서 사용하는 데이터는 monophonic piano music
4. 계산 비용이 적고, 빠르고, 큰 학습 데이터가 필요 없으며 사용 neural network based system과 비교해도 좋은 성능을 냄
같이 보면 좋은 사이트
1. https://www.lunaverus.com/ : AnthemScore 4, neural network기반으로 raw 오디오 파일을 넣어주면 note pitch와 beat등을 수정할 수 있고 score를 만들 수 있다.

Tesla AI day (8/19)

(한글자막 1편) https://www.youtube.com/watch?v=Ah-TMrKSvic (한글자막 2편) https://www.youtube.com/watch?v=7NCkxV_vMdY

가트너의 2021년도 emerging technology hype cycle (8/24)

관련 링크 : https://www.gartner.com/smarterwithgartner/3-themes-surface-in-the-2021-hype-cycle-for-emerging-technologies/

2021년도 emerging technology hype cycle

세레브라스 시스템즈(Cerebras Systems), 인간 뇌 크기의 AI 모델을 구현할 수 있다는 시스템 개발 (8/25)

https://www.tomshardware.com/news/worlds-largest-chip-unlocks-brain-sized-ai-models-with-163-million-core-cluster?fbclid=IwAR2D6z9V6mlVLoVvyculceUaqEeUoyMza-1AKC-zKzskpbhrviiEhs1XbHc

https://cerebras.net/system/

85만개 코어 시스템을 192개까지 연결해 1억 6200만 AI 코어 모델을 만들 수

최대 가능한 120조개의 파라미터 AI 모델을 지원

1750억개 파라미터를 사용하는 GPT-3 모델 500배 규모까지도 지원 가능

GPT-3를 하루만에 훈련시킬 수 있다는 루머로 추정한 비교 : GPT-3 학습에 필요한 컴퓨팅 성능이 3.114E23 FLOPS(floating-point operations per second, 1초당 수행할 수 있는 부동 소수점 연산 횟수)이고, 이론적으로 28 TFLOPS(테라 FLOPS) 용량의 V100 GPU 서버에서 355년이 걸린다고 하는 수준

이걸 기준으로 역산해보면 V100 GPU를 129,575개 장착해 돌리는 수준과 같다고

다른 통계로는 ....

CS-2 캐비넷 하나당 4-5백만달러 정도 한다고 하니 200 캐비넷으로 구성된 브레인 사이즈가 8-10억달러, 1조원 정도 되네요. F-35 전투기 5-6대 가격.

전력도 .. 캐비넷당 피크 전력 25 kW 쓴다니까 200 곱하면 5 MW 인데 결국은 이게 모두 열로 방출될 것이라, 근처에 지역난방 시스템이라도 구축해야 한다.

V100을 800만원 기준으로 했을때도 13만개를 구매하면 1조 정도 되네요. 1000만원 가격 기준으로 하면 1.3조 정도되구요. 약간 떨어지는 성능의 RTX 3090 TI를 15만대 연결한다고 가정하면 3750억 정도로 가격은 떨어진다

근데 V100을 13만대 쌓아서 돌리려면 높이 12.5km x 길이 7.5km x 넓이 2.86km 짜리 구조물이 필요하고, 소비 전력량은 71.5 MW가 필요하다는게 단점이다

북한, 인공지능 기술로 금 가격 예측

https://www.nkeconomy.com/news/articleView.html?idxno=4571

경제연구 2020년 4호에는 ‘웨블레트 변환과 LSTM신경망을 결합한 금 가격 예측방법’

[고학수 칼럼] ‘공정한 인공지능’의 어려움

http://www.aitimes.com/news/articleView.html?idxno=140243

구체적으로 어떤 공정성 기준을 적용하여 인공지능 알고리즘을 평가하는지에 따라 매우 다른 결론이 도출될 수 있다는 것

오류율(equalized odds), 예측동등성(equal calibration)

20 QUIRKY AND INTERESTING MACHINE LEARNING INTERVIEW QUESTIONS

https://www.analyticsinsight.net/20-quirky-and-interesting-machine-learning-interview-questions/

What is the similarity between Hadoop and K?

If a linear regression model shows a 90% confidence interval, what does that mean?

A single-layer perceptron or a 2-layer decision tree, which one is superior in terms of expressiveness?

How can a neural network be used for dimensionality?

Name two utilities of the intercept term in linear regression?

Why do a majority of machine learning algorithms involve some kind of matrix manipulation?

Is time series really a simple linear regression problem with one response variable predictor?

Can it be mathematically proven that finding the optimal decision trees for a classification problem among all decisions trees is hard?

Which is easier, a deep neural network or a decision tree model?

Apart from back-propagation, what are some of the other alternative techniques to train a neural network?

How can one tackle the impact of correlation among predictors on principal component analysis?

Is there a way to work beyond the 99% accuracy mark on a classification model?

How can one capture the correlation between continuous and categorical variables?

Does k-fold cross-validation work well with time-series model?

Why can’t simple random sampling of training data set and validation set work for a classification problem?

What should be a priority, a model accuracy or model performance?

What is your preferred approach for multiple CPU cores, boosted tree algorithm, or random forest?

What algorithm works best for tiny storage, logistic regression, or k-nearest neighbor?

What are the criteria to choose the right ML algorithm?.

Why can’t logistic regression use more than 2 classes?

How to avoid machine learning pitfalls: a guide for academic researchers

https://arxiv.org/abs/2108.02497

머신 러닝 연구의 해야 할 일과 하지 말아야 할 일 (Lones)

학계에서 ML 연구를 수행하는 동안 그리고 ML 연구를 수행하는 학생을 감독하는 동안 배운 교훈

2 Before you start to build models 3 How to reliably build models 4 How to robustly evaluate models 5 How to compare models fairly 6 How to report your results 7 Final thoughts

jungwoo-ha / WeeklyArxivTalk

[20210829] Weekly AI ArXiv 만담 #22

Paper

Tesla AI day (8/19)

가트너의 2021년도 emerging technology hype cycle (8/24)

세레브라스 시스템즈(Cerebras Systems), 인간 뇌 크기의 AI 모델을 구현할 수 있다는 시스템 개발 (8/25)

북한, 인공지능 기술로 금 가격 예측

[고학수 칼럼] ‘공정한 인공지능’의 어려움

20 QUIRKY AND INTERESTING MACHINE LEARNING INTERVIEW QUESTIONS

How to avoid machine learning pitfalls: a guide for academic researchers