[20210725] Weekly AI ArXiv 만담

AI News
- ICCV notification out: accept letter 받으신 모든 분들 축하드립니다!
- IBM Watson 의 현주소
- NYT: https://www.nytimes.com/2021/07/16/technology/what-happened-ibm-watson.html
- 국내: https://news.naver.com/main/read.naver?mode=LSD&mid=sec&sid1=105&oid=023&aid=0003627854
- AlphaFold protein DB 공개: https://alphafold.ebi.ac.uk/
- Tensor RT 8: https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html
AI Arxiv
- All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
- Outstanding paper in ACL2021
- GPT3같은 고성능 NLG 모델의 성능 평가할 때는 평가자들에게 미리 훈육을 해야한다. 아니면 결과를 믿기 어려움
- 예) 평가자들이 NLG모델이 문법 같은 것들이 틀릴 것이라는 가정을 하고 있지만 사실 이거 잘못됐음.그래서 많이 틀림
- 결론은 NLG와 사람이 쓴 글 예시를 몇개 보여주는 example-based 훈련 하고 진행하면 그래도 정확도가 통계적으로 좋아지더라
- 주의할점. GPT3-175B의 경우 훈육 안하면 정확도 50%(랜덤 추측), 훈육해도 57% ㄷㄷㄷ
- Beyond Goldfish Memory: Long-Term Open-Domain Conversation
- Blender Bot 2.0의 주요논문 1 - 금붕어 기억력 탈출
- Multi session chat 데이터: 크라우드 소싱 활용해서 situated 시간 점프 채팅
- Token 길이를 얼마나 두느냐 (이전 대화 활용). Summary를 활용하느냐 (Gold vs. Predicted)
- Internet-Augmented Dialogue Generation
- Blender Bot 2.0의 주요논문 2 - 인터넷을 활용한 대화 생성.
- Retrieval-augmented generation 을 넘은 Internet-augmented generation
- 최신성, hallucination 두가지 문제를 해결 가능
- Wizard of the Internet dataset
- 실제 인터넷 search를 하면 document level information이 나오기 때문에 이를 잘 summary 하는 것이 숙제
- TumorCP: A Simple but Effective Object-Level Data Augmentation for Tumor Segmentation
- copy-paste 기반의 간단한 medical data augmentation
- source-target pair 고르고 object 뜯은 다음 spatial, gamma, blurring을 확률 적으로 transform후 image-level augmentation
- IntraCP and InterCP (환자내, 환자간)
- kidney, tumor 데이터 적용, 데이터 숫자 때문인지 Tumor에선 꽤 효과가 좋아 보임.
- https://github.com/YaoZhang93/TumorCP (아직은 껍데기만)
- An overview of mixing augmentation methods and augmentation strategies
- Image classification 백본 훈련에서 Data augmentation 총정리
- CIFAR-10, 100, ImageNet 에서 최근 나온 DA 기법까지 총망라
- the pixel-wise approaches (e.g. Mixup) work better with noise (corrupted images or incorrect labels) while the patchwise ones (e.g. CutMix ) are better suited to the task of partial occlusion or weakly supervised object localization problem.
- 결국은 잘 섞어서 사용하세요 라는.. 전체적으로 cutmix가 간단하면서도 매우 효과적임을 확인 가능
- Ready for Emerging Threats to Recommender Systems? A Graph Convolution-based Generative Shilling Attack
- 추천시스템의 robustness 를 attack 하는 연구
- Shilling attack : 대량의 가짜 유저 프로파일을 injection
- 예전엔 너무 단순해서 효과가 부실하거나.. 효과는 있으나 cost가 많이 들어 실효성 없었다고
- GCN이용 co-related items correlation과 가짜 점수 스무드를 하는 GAN을 이용하 real rating distribution 을 배움.
- Triplet is All You Need with Random Mappings for Unsupervised Visual Representation Learning
- Triplet loss + CE loss 로 negative sample 하나만 활용하는 SSL
- 미니배치 사이즈 클 필요 없음. SimCLR, SimSiam 과 얹어서 활용가능
- similarity function 에 random matrix를 활용해서 M을 decompose 한 공간상에서도 학습이되도록하는 효과
- 리포팅 된 실험 데이터 사이즈가 작은 게 아쉬운데 ImageNet-1k 수준에서 체크해보고 싶은..

Paper

Sequence-to-Sequence Piano Transcription with Transformers
- Code : https://goo.gl/magenta/seq2seq-piano-transcription-code (아직 없음)
- Problem : Automatic Music Transcription (AMT) 의 기존 연구들은 모델의 architecture, input/output representation 등에 대한 광범위한 도메인별 디자인이 필요하였음
- Contribution : domain-specific adaptation 없이 자동으로 piano audio를 MIDI 열로 바꾸어주는 방법을 제안하였음
- Method
  - autoregressive encoder-decoder Transformer architecture
  - input은 mel-spectrogram, output은 각 frame당 softmax 결과값이 나오는데 target은 MIDI로부터 추출한 vocabulary
  - Note : [128개의 값] (128개의 MIDI pitches, 실험에서는 실제로 사용되는 피아노 키에 해당하는 88개의 pith를 사용)
  - Velocity : [128개의 값] 다음 note event가 발생할 때까지의 속력 (빠르기), note-off 시에는 0
  - Time : [6,000개의 값] segment 안에서 note가 나타나는 절대 시간 위치. 10ms bin으로 나누어둠
  - EOS : [1개의 값] sequence의 끝을 나타냄
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
- Sample : https://starganv2-vc.github.io/
- Conference : INTERSPEECH 2021
- Problem : non-parallel voice conversion (auto-encoder-based/TTS-based/GAN-based) speaker 정보를 없이기 위한 적잘한 constraint가 설계되어야하거나, speaker similarity가 떨어지거나, text label이 필요하거나 하는 문제가 있음
- Contribution
  - StarGAN v2를 voice conversion에 적용하여 plain speech를 다양한 스타일의 speech로 conversion 하였음
  - novel adversarial source classifier loss로 변환된 음성과 타겟 음성 간의 similarity를 높였음
  - ASR과 F0 extraction network를 모두 이용한 perceptual loss를 VC에서 처음 사용함
Digital Einstein Experience: Fast Text-to-Speech for Conversational AI
- Homepage : https://einstein.digitalhumans.com/
- Blog : https://www.aflorithmic.ai/post/creating-einsteins-voice
- Conference : INTERSPEECH 2021
- Contribution
  - Digital Einstein chatbot -real-time으로 einstein 목소리와 표정을 생성
- Method
  - FastSpeech2를 이용하였고 log-scaled mel-spectrogram을 예측
  - ParallelWaveGAN 사용

Paper

User-specific Adaptive Fine-tuning for Cross-domain Recommendations
- 추천시스템의 cross domain transfer를 위한 finetuning 방법
- 타겟 데이터셋 크기가 작을 때, 파라미터가 매우 큰 모델을 통째로 finetuning하면 overfitting이 발생하기 쉬움
- 유저별로 서로 다른 레이어를 tuning해줌으로써, finetuning cost를 줄이고, performance도 향상시킬 수 있었다라고 주장
- input sequence가 들어왔을 때 어떤 레이어를 tuning해줄지 결정하는 policy networks 존재
- hard (gumbel softmax) / soft (gating mechanism) / RL (REINFORCE style)
End-to-End Learning of Coherent Probabilistic Forecasts for Hierarchical Time Series (ICML 21)
- Hierarchical time-series forecasting을 end2end 하게 학습하는 모델
- Coherence constraints: shop 단위 주문건수 = sum of item 단위 주문건수
- 각 예측은 특정 확률 분포에서 샘플링해야함 (probablistic forecasting)
- 샘플링은 end2end가 안되므로 VAE에서처럼 reparameterization trick 사용 y= mean + sigma*z (z is sampled from normal dist.)
- Coherence를 맞추기 위해서 hierarchy에 맞는 matrix 정의
- Transformed sample should be in the coherence sub-space

News

https://www.wsj.com/video/series/inside-tiktoks-highly-secretive-algorithm/investigation-how-tiktok-algorithm-figures-out-your-deepest-desires/6C0C2040-FF25-4827-8528-2BD6612E3796?mc_cid=3453acec1b&mc_eid=b3e8464478
- Tiktok’s algorithm analysis
- Toppop -> niche contents that makes you stay longer on the platform (in few hours),
- View time optimization (watching for longer time doesn’t mean you really like it but it is your vulnerability)

Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria

https://www.nejm.org/doi/full/10.1056/NEJMoa2027540

Supplementary Material: https://www.gwern.net/docs/ai/2021-moses-supplement.pdf

AI가 드디어 독심술에 성공했습니다! ㄷㄷㄷㄷㄷㄷ New England Journal of Medicine (NEJM)에서 마비가 찾아온 사람의 뇌신호로부터 단어 및 문장을 예측하는데 성공했습니다. 아직 paywall 뒤에 있지만 조만간 큰 뉴스가 될 것으로 예상됩니다. 몇 년 후에 뇌내 AI에 의해 의사소통하는 기술의 시조가 될지도 모릅니다 ㅎㅎㅎ

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

https://arxiv.org/abs/2107.10833 ESRGAN 모델을 synthetic data를 통해 학습했을 때 real data와 마찬가지로 좋은 효과를 보여주는 페이퍼입니다. 제가 예전에 ESRGAN을 많이 사용했었는데 후속 프로젝토 또한 image reconstruction task에 많은 도움이 될 것 같습니다.

논문 소개는 아니지만... 크롬을 사용하시는 분들께 공유하면 좋을 듯 해서 올려봅니다.

https://chrome.google.com/webstore/detail/arxive/hkoblclipggkhhbllgefhnbjdcajmelh?hl=ko

chrome extension으로 'Arxive' 라고 하나 만들어서 올려봤습니다. 크롬 웹스토어에서 직접 검색하셔도 나옵니다.

정말 간단하게 arxiv 사이트를 통해 논문을 받을 때 파일 제목을 논문제목 + 저자 + 연도로 저장할 수 있게 만들었습니다. arxiv 사이트에 접속하면 원래 논문을 다운받던 pdf 밑에 Direct download가 페이지에 하나 생기고 해당 링크를 클릭하시면 됩니다. 설정에서 저자, 연도 유무 선택이 가능하고, 바로 다운로드 폴더에 받을지 다른이름으로 저장할지도 설정 가능합니다. 필요하신 분 있으시면 쓰시면 좋을 듯 합니다. (나중에 시간나면 다른 사이트들도 추가해볼까 합니다)

jungwoo-ha / WeeklyArxivTalk

[20210725] Weekly AI ArXiv 만담 #18

Paper