[20210926] Weekly AI ArXiv 만담

jungwoo-ha commented 3 years ago

News
- UK National AI Strategy
- Accepted paper list in EMNLP 2021
- ICLR 2022 abstract deadline (9. 30 9AM)
- 컨퍼런스 & 웨비나
- AI미래포럼-한국공학한림원 공동주최: AI인재, 국가 포트폴리오 전략을 짜자.
- 국가통계방법론 국제심포지엄
- 스마트클라우드쇼 2021
ArXiv
- Primer: Searching for Efficient Transformers for Language Modeling
- Evolutionay search based on Tensorflow for speedup (from Google Quoc Le group)
- Program -> subprogam -> instruction -> TF code <- TF Primitive vocab
- 그렇게 발견된 Primer: 핵심은 1) Q,K,V 뒤에 3x1 spatial dwconv, 2) squared ReLU
- Speedup factor: Vanilla transformer 최종성능에 도달하는 시간에 대한 속도 개선 비율 (대략 2배)
- Language Models are Few-shot Multilingual Learners
- 초대규모 LM 은 Multilingual NLU도 in context few-shot 형태로 잘하더라 (코퍼스가 multilingual 임) (from HKUST)
- Prompt와 shot이 multi-lingual 임. 여러개 언어 Mono-lingual & cross-lingual 실험
- T5와 GPT2, GPT-Neo 다양한 버전을 실험해봄
- Recursively Summarizing Books with Human Feedback
- Pretrained LM + Human in the loop을 이용 책 수준의 문서 요약을 하는 기법 (from OpenAI)
- 전체 긴 책을 recursive 하게 작은 단위로 분할 --> 사람에게 일부 요약 --> 요약된 것을 활용 학습 --> 쌓아나감
- RL vs. behivior cloning (supervised), Full tree vs. first sub tree wrt 정성평가(Likert) 정량평가 (ROUGE, BLUE)
- First subtree RL이 성능이 좋음 (모델이 클때), RL > BC
- BookSum summary 도 NarrativeQA zero-shot QA 도
- 전체 문제정의 with HITL /실험설계/평가 관점에서 유용해 보임
- Recent Advances of Continual Learning in Computer Vision: An Overview
- 2016년부터 시작해서 ICCV 2021까지 Continual learning 관련된 연구 총정리
- 문제정의, 메트릭, method 관점에서 자세하게 구분 설명
- What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
- 네이버 클로바의 한국어 82B 스케일 GPT3인 HyperCLOVA 논문 (EMNLP 2021)
- 13B와 175B 사이의 결과, 한국어 고려 tokenizer, p-tuning 의 효과, HyperCLOVA studio (No Code AI)
- 실제 in house application으로 어떻게 활용될 수 있는 지도 포함

ghlee3401 commented 3 years ago

TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method
- Microsoft Research Asia
- Sample page : https://ai-muzic.github.io/telemelody/
- Code : https://github.com/microsoft/muzic/tree/main/telemelody
- Method
  - lyric to music template (tonality, chord, rhythm, cadence) module과 template-to-melody module로 구성
  - lyric to music template은 supervised learning
  - template-to-melody module은 self-supervised learning
- Problem & Contribution
  - 기존의 방법은 training data가 많이 필요 -> template-to-melody는 self-supervised learning으로 학습
  - user가 조절하기 어려움 -> 음악적 요소 (tonalty, chord) 를 바꾸어 새로운 멜로디를 만들 수 있음

On-device neural speech synthesis
- Apple
- Hybrid unit-selection : https://machinelearning.apple.com/research/siri-voices
- Tacotron과 WaveRNN으로 구성
- Tacotron에서 location sensitive monotonic attention을 사용 (location-sensitive attention + monotonic attention)
- WaveRNN은 16-bit -> 8-bit $\mu$-law quantization을 사용, 896개 unit을 512개로 줄임
- 빠른 inference를 위하여 incremental 하게 생성하는 것으로 보이며 server, on-device 등에서 최적화 시키기 위해 여러가지 노력을 한 것으로 보임

hollobit commented 3 years ago

FDA의 Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices 목록 공개

https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices

다양한 리스들로부터 취합한 총 343개 목록. 순수 AI/ML-MD에 대한 목록이라기 보다는 공개적으로 사용 가능한 정보를 기반으로 하는 의료 분야 전반의 AI/ML 지원 장치 목록에 가까움

IEEE Spectrum의 "딥 러닝의 수익 감소"에 대한 특집 기사

ML 개선 비용이 지속 불가능해지고 있다. Winter is coming ?

https://spectrum.ieee.org/deep-learning-computational-cost

그동안 딥러닝이 발전해온 모델 - 더 많은 컴퓨팅 성능을 사용하여 더 큰 모델을 만들고 더 많은 데이터로 훈련하면 딥 러닝에서 성능을 향상시킬 수 있다. 그러나 이 계산 부담은 얼마나 비쌀까요? 이제는 기회 비용이 충분히 높아진 것은 아닐까 ?

연구 결과에 따르면 오류율을 절반으로 줄이려면 500배 이상의 계산 리소스가 필요할 것으로 예상할 수 있음

최근 몇 년간의 이득을 외삽하면 ImageNet 데이터 세트의 객체를 인식하도록 설계된 최고의 딥 러닝 시스템의 오류 수준이 2025년까지 5%[상단]로 줄어들어야 하는데, 이런 미래 시스템을 훈련하는 데 필요한 컴퓨팅 리소스와 에너지는 뉴욕시가 한 달 동안 생성하는 양의 이산화탄소를 배출하게 될 것 출처: NC THOMPSON, K. GRENEWALD, K. LEE, GF MANSO

Energy and Policy Considerations for Deep Learning in NLP - https://arxiv.org/abs/1906.02243

증가하는 경제적 및 환경적 비용에 직면 하여 딥 러닝 커뮤니티는 컴퓨팅 요구 사항을 초과하지 않으면서 성능을 향상시킬 수 있는 방법을 찾아야 한다는...

[참고] 탄소중립 세계 14번째 법제화…탄소중립기본법 국회 통과 - https://www.korea.kr/news/policyNewsView.do?newsId=148892495

[참고] 2030년까지 온실가스 ‘35% 이상’ 감축…거버넌스 확대·정의로운 전환 고려

탄소중립 NLP, 탄소중립 ML/DL, 조만간 탄소 footprint 의 등장 ?

Google의 새로운 딥러닝 시스템 - 비정상 CXR 사진 판별

Deep learning for distinguishing normal versus abnormal chest radiographs and generalization to two unseen diseases tuberculosis and COVID-19

https://www.nature.com/articles/s41598-021-93967-2

매년 전 세계적으로 약 8억 3,700만 개의 CXR이 획득. 그 결과 방사선 전문의 및 기타 의료 전문가의 검토 부담이 큼

방사선 전문의와 비슷하거나 더 높은 성능으로 폐렴, 흉막삼출, 골절과 같은 특정 질병을 감지하기 위해 많은 알고리즘이 개발되고 있으나, 특정 질병을 감지하도록 개발되었기 때문에 학습되지 않은 질병을 인식하지 못한다는 문제도 있음

연구에서 X선 이상 감지에 사용된 모델인 B7은 EfficientNet 제품군 중 가장 큰 모델 로 813개의 레이어와 6,600만 개의 파라미터로 구성

딥 러닝 모델은 인도의 5개 병원에서 발생한 250,000건 이상의 엑스레이 스캔에 대해 훈련

재미있는 것은 모델 훈련에 Google의 TPU 프로세서 대신 10개의 Tesla V100 GPU를 사용

비정상적인 경우에 대한 처리 시간이 7-28% 감소할 수 있으며, 강력한 1차 우선 순위 지정 도구로의 가능성 확인

Data availability - https://www.nature.com/articles/s41598-021-93967-2#data-availability

Code availability - https://www.nature.com/articles/s41598-021-93967-2#code-availability

nick-jhlee commented 3 years ago

AISTATS 2022 abstract due! (담주 금욜 저녁 9시)
따끈따끈한 another plagiarism! (~~it's been a month, so it's about time~~)
- 장르: toned-down 아침드라마
- A의 저자들이 (현재 under review)인 논문을 arxiv에 올림: https://arxiv.org/abs/2108.10520
- 근데 B라는 논문이 arxiv에 올라왔는데 너무 비슷함: https://arxiv.org/abs/2109.07843 (currently pulled the pdf from arxiv)
- 베낀줄 알았는데, 알고 보니 B는 C를 베낀거였음! (C는 아직 unpublished) <- feat. Reddit
- A 저자들은 C가 있는걸 아예 모르고 있었음
- A 저자들은 Reddit에서 A와 C의 차이점을 나열하면서 진짜 몰랐고, 필요하면 C를 citation + concurrent work로 하겠다고 함
- (일단) 결론: B가 선넘었다

veritas9872 commented 3 years ago

torch.manual seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision

Arxiv: https://arxiv.org/pdf/2109.08203.pdf

Screenshot (66) Screenshot (67) Screenshot (68)

상당히 어그로성의(?) 논문으로 최근 페이스북과 트위터에서 많은 이목을 끈 논문입니다.

이미 많은 사람들이 random seed 등 training randomness에 의해 결과의 차이가 발생한다는 것은 잘 알려져 있는 이슈입니다. 특히 RL 분야에서는 reproducibility가 더더욱 어려운 것으로 알고 있습니다.

하지만 딥러닝 커뮤니티에서는 실험을 진행할 때 생물학 등 분야와는 다르게 실험을 1번만 진행하고 결과를 보고하는 것이 일반적인데 실제 여러 seed 값을 적용했을 때 결과가 얼마나 차이나는지 정량화하는 것을 목표로 하는 논문입니다. 특히 특정 seed에 따라 결과가 매우 잘 나오는 "black swan"의 빈도를 확인하고자 합니다.

작은 ResNet9 모델을 10,000번, pre-trained ResNet50 모델을 50번 학습시키면서 random initialization에 따른 결과의 차이가 상당히 큰 것을 실험적으로 보여주기 때문에 흥미롭습니다.

약간의 마이너스는 글의 형식이 논문보다는 블로그에 더 가깝다는 점입니다.

참고로 생물학 등 분야에서도 실험이 재현 불가능하거나 기존 보고보다 성능이 낮게 나오는 것은 너무 흔한 현상이어서 Nature에서 별도의 웹사이트까지 있습니다. Nature collection on "Statistics for Biologists": https://www.nature.com/collections/qghhqm/

nick-jhlee commented 3 years ago

Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision
- 알고리즘 A와 B를 비교하는건 통계적으로 매우 조심스럽게 해야한다... ==> cherry picking seeds, simple effect of randomness...etc.
- 물론 대다수의 딥러닝 (특히 CV, NLP) 알고리즘들은 확실한 advancement가 있다. 이건 부인 ㄴㄴ
- 하지만, "However, in the light of this short study, I am inclined to believe that many results are overstated due to implicit seed selection - be it from common experimental practice of trial and error or of the “evolutionary pressure” that peer review exerts on them."
- What is the distribution of scores with respect to the choice of seed?
- The distribution of accuracy when varying seeds is relatively pointy, which means that results are fairly concentrated around the mean. Once the model converged, this distribution is relatively stable which means that some seed are intrinsically better than others.
- Are there black swans, i.e., seeds that produce radically different results?
- Yes. On a scanning of 10^4 seeds, we obtained a difference between the maximum and minimum accuracy close to 2% which is above the threshold commonly used by the computer vision community of what is considered significant.
- Does pretraining on larger datasets mitigate variability induced by the choice of seed?
- It certainly reduces the variations due to using different seeds, but it does not mitigate it. On Imagenet, we found a difference between the maximum and the minimum accuracy of around 0.5%, which is commonly accepted as significant by the community for this dataset.
- 이 연구의 한계점도 아주 자세히 서술함 at Section 3 (<- 감동쓰..)
- 결론 아직은 괜찮은데... 큰일날 수 있다는 낌새가....

nick-jhlee commented 3 years ago

앗 겹쳤네요 ㅎㅎㅎ

veritas9872 commented 3 years ago

앗 ㅋㅋㅋㅋㅋ

nick-jhlee commented 3 years ago

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? An Extensive Empirical Study on Language Tasks

nick-jhlee commented 3 years ago

Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment

twitter: https://twitter.com/lawrennd/status/1440560979260051466?s=21

jwlee-ml commented 3 years ago

Pix2seq: A Language Modeling Framework for Object Detection by Google Research, Brain Team w/ G.Hinton Object Detection도 language model로 풀어보자
Furiosa MLPerf에서 Nvidia 뛰어넘었다 https://www.etnews.com/20210923000051

MLPerf v1.1 Result

sooyong-shin commented 3 years ago

[2111261] 인공지능 육성 및 신뢰 기반 조성 등에 관한 법률안(정필모의원 등 23인) (21년 7월 1일 발의) https://likms.assembly.go.kr/bill/billDetail.do?billId=PRC_Y2B1M0R6G2I2P1B0V2X9H4Z0X3M3J2

이런 법안이 발의되어 있습니다.. 여러가지 애매한 점이 있지만 가장 큰 문제 중 하나는.. 인공지능의 정의입니다. 해당 법안을 보면

"제2조(정의) 이 법에서 사용하는 용어의 뜻은 다음과 같다.

“인공지능”이란 학습, 추론, 지각, 판단, 자연언어의 이해 등 인간이 가진 지적 능력을 전자적 방법으로 구현하기 위한 것을 말한다."

보다 정확한 학술적 정의를 찾는 게 필요해 보입니다.

hollobit commented 3 years ago

[2111261] 인공지능 육성 및 신뢰 기반 조성 등에 관한 법률안(정필모의원 등 23인) (21년 7월 1일 발의) https://likms.assembly.go.kr/bill/billDetail.do?billId=PRC_Y2B1M0R6G2I2P1B0V2X9H4Z0X3M3J2

이런 법안이 발의되어 있습니다.. 여러가지 애매한 점이 있지만 가장 큰 문제 중 하나는.. 인공지능의 정의입니다. 해당 법안을 보면

"제2조(정의) 이 법에서 사용하는 용어의 뜻은 다음과 같다.

“인공지능”이란 학습, 추론, 지각, 판단, 자연언어의 이해 등 인간이 가진 지적 능력을 전자적 방법으로 구현하기 위한 것을 말한다."

보다 정확한 학술적 정의를 찾는 게 필요해 보입니다.

이미 행정부처에서는 2017년부터 합의된 용어처럼 보입니다 ^^

식약처의 "빅데이터 및 인공지능(AI) 기술이 적용된 의료기기의 허가·심사 가이드라인(민원인 안내서) (2017, 2019)"의 정의 내용

나. 인공지능(Artificial Intelligence) 인지, 학습 등 인간의 지적능력(지능)의 일부 또는 전체를 컴퓨터를 이용해 기계학습 등으로 구현하는 기술

과기정통부의 "I-Korea 4.0 실현을 위한 인공지능(AI) R&D 전략 (2018)"에서의 정의 내용

인공지능은 인지, 학습 등 인간의 지적능력(지능)의 일부 또는 전체를 ‘컴퓨터를 이용해 구현하는 지능’을 의미

jungwoo-ha / WeeklyArxivTalk