jungwoo-ha / WeeklyArxivTalk

[Zoom & Facebook Live] Weekly AI Arxiv 시즌2
973 stars 41 forks source link

[20230305] Weekly AI ArXiv 만담 시즌2 - 8회차 #74

Open scene-the-ella opened 1 year ago

jungwoo-ha commented 1 year ago

News

2️⃣ ColabFold: making protein folding accessible to all -> (From multiple institutions, 1162 citations) An open-source and efficient protein folding model.

3️⃣ Hierarchical Text-Conditional Image Generation with CLIP Latents -> (From OpenAI, 718 citations) DALL·E 2, complex prompted image generation that left most in awe.

4️⃣ A ConvNet for the 2020s -> (From Meta and UC Berkeley, 690 citations) A successful modernization of CNNs at a time of boom for Transformers in Computer Vision.

5️⃣ PaLM: Scaling Language Modeling with Pathways -> (From Google, 452 citations) Google's mammoth 540B Large Language Model, a new MLOps infrastructure, and how it performs.

2021 1️⃣ Highly accurate protein structure prediction with AlphaFold -> (From DeepMind, 8965) AlphaFold, a breakthrough in protein structure prediction using Deep Learning.

2️⃣ Swin Transformer: Hierarchical Vision Transformer using Shifted Windows -> (From Microsoft, 4810 citations) A robust variant of Transformers for Vision.

3️⃣ Learning Transferable Visual Models From Natural Language Supervision -> (From OpenAI, 3204 citations) CLIP, image-text pairs at scale to learn joint image-text representations in a self supervised fashion

4️⃣ On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? -> (From U. Washington, Black in AI, The Aether, 1266 citations) Famous position paper very critical of the trend of ever-growing language models, highlighting their limitations and dangers.

5️⃣ Emerging Properties in Self-Supervised Vision Transformers -> (From Meta, 1219 citations) DINO, showing how self-supervision on images led to the emergence of some sort of proto-object segmentation in Transformers.

2020 1️⃣ An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale -> (From Google, 11914 citations) The first work showing how a plain Transformer could do great in Computer Vision.

2️⃣ Language Models are Few-Shot Learners -> (From OpenAI, 8070 citations) GPT-3, This paper does not need further explanation at this stage.

3️⃣ YOLOv4: Optimal Speed and Accuracy of Object Detection -> (From Academia Sinica, Taiwan, 8014 citations) Robust and fast object detection sells like hotcakes.

4️⃣ Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer -> (From Google, 5906 citations) A rigorous study of transfer learning with Transformers, resulting in the famous T5.

5️⃣ Bootstrap your own latent: A new approach to self-supervised Learning -> (From DeepMind and Imperial College, 2873 citations) Showing that negatives are not even necessary for representation learning.

ArXiv

gyunggyung commented 1 year ago

LLaMA 이야기.

최근 페이스북이 공유한 LLaMA의 더 최신 이야기 입니다.

  1. LLaMA-7B: 체크포인드를 공개해주셨습니다. 어떻게 만드신 건지 아직 모르겠습니다.
  2. llama-up-data: hunkim님께서 LLaMA로 챗봇을 만들어주셨습니다. 다만 작다 보니 성능은....
  3. llama-int8: 양자화를 잘 해서 3090, 4090에서도 돌아갈 수 있게 만들어주셨습니다. LLaMA INT8 Inference guide
  4. LLaMA 질문: 궁금한 질문을 했지만 누구도 답을 주지 않는....

이미지 관련 모델.

  1. Beating OpenAI CLIP with 100x less data and compute: 100배 적은 데이터로도 좋은 성능을 보여줍니다. 범용성이 높아서 추후에 많이 쓰지 않을까 싶습니다. 심지어 한국어도 잘합니다. 관련해서 질문이 있으시면 제가 대신 전달 드릴 수도 있습니다.
  2. AI Generated Images Are Getting Too Real | Asmongold Reacts: 이제 이미지 생성을 정말 자연스럽게 합니다. 특히 LoRA는 공부를 해봐야겠습니다. 관련 결과를 몇 개 공유합니다. image image image image image image image image image
  3. AI Art is getting too good! Can YOU Tell the Difference?: 인공지능이 그린 것을 찾아보세요! image image
veritas9872 commented 1 year ago

High-resolution image reconstruction with latent diffusion models from human brain activity BioArXiv: https://www.biorxiv.org/content/10.1101/2022.11.18.517004 Website: https://sites.google.com/view/stablediffusion-with-brain/

image

어제부터 트위터에서 크게 화제가 된 논문을 공유해드립니다. 뇌의 fMRI 신호에서 L2 regularized linear model(???)을 학습해 stable diffusion의 image와 text latent encoding에 맞추는 모델을 만들었을 때 대상에게 보여준 영상과 유사한 영상을 복원할 수 있다는 것을 보여준 연구입니다.

각 대상마다 수천장의 영상이 있어야하며 한 모델은 한 대상에게만, 그리고 아마 한 장치에 대해서만 사용 가능할 것으로 예상하지만 뇌파 정보에서 딥러닝 학습을 이용하지 않고 pretrained model과 linear model 학습만을 통해 복원 가능하다는 것을 보여주어 파급력이 매우 클 것으로 생각됩니다. 다만, 재현성을 확인해야지만 신뢰 가능할 것 같습니다.

Dropout Reduces Underfitting ArXiv: https://arxiv.org/abs/2303.01500 GitHub: https://github.com/facebookresearch/dropout

image

image

image

Consistency Models ArXiv: https://arxiv.org/abs/2303.01469

image

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages ArXiv: https://arxiv.org/abs/2303.01037

Full Stack Optimization of Transformer Inference: a Survey ArXiv: https://arxiv.org/abs/2302.14017

Transformer 모델의 최적화에 대한 하드웨어 및 소프트웨어 최적화 및 이슈에 대해 잘 정리된 survey paper 공유해드립니다.

ghlee3401 commented 1 year ago

Arxiv

jwlee-neubla commented 1 year ago