Title: Developing a Safe and Ethical AI Content Generation System: Filtering and Evaluation Criteria for Textual Outputs

Abstract:

The rapid development of AI language models, such as GPT-4, has brought forth increasingly sophisticated content generation capabilities. However, this progress has raised concerns regarding the ethical implications and potential harm of generated content. This paper proposes a system for filtering and evaluating AI-generated text based on a set of predefined criteria. The primary goal is to ensure safer and more ethical AI content generation, while maintaining the model's creativity and utility. The proposed system utilizes fine-tuning, external knowledge integration, and evaluation metrics to improve the AI model's ability to generate appropriate content. Additionally, the paper explores the integration of AGI techniques to further refine the system's performance.

News

LinkedIn과 DeepMind의 공동 창업자, 더 높은 감성 지능을 가진 ChatGPT 라이벌 '파이' 출시 ↗ 다른 챗봇과 달리 Pi는 개인적으로 설계되어 사실에 기반한 답변을 제공할 수 있지만, 파라소셜 관계 육성은 피합니다. Inflection은 실시간 콘텐츠로 Pi를 향상시키고 링크를 공유하며 사용자의 일정, 이메일 및 기타 문서를 통합하여 시간을 보다 효율적으로 관리할 계획입니다. 이 회사는 현재까지 2억 2천 5백만 달러의 자금을 조달했습니다. 파이에게 직접 해보세요.
IBM은 AI로 대체될 수 있는 일자리에 대한 채용을 중단하고 있습니다 IBM의 CEO인 Arvind Krishna는 향후 5년 동안 AI와 자동화로 대체될 수 있는 인력을 포함한 백오피스 기능에 대한 채용을 중단할 계획이라고 발표했습니다. 이로 인해 약 7,800개의 일자리가 사라질 수 있습니다. 고용 확인 및 직원 이동과 같은 일상적인 작업은 완전히 자동화될 가능성이 높지만, 인력 구성 및 생산성 평가는 향후 10년 동안 대체되지 않을 수 있습니다.
비디오를 위한 세그먼트화? Track-Anything은 비디오 객체 추적 및 세분화를 위한 유연하고 상호 작용적인 도구입니다 ↗ 트랙-Anything은 SAM, XMem 및 E2FGVI를 비디오에 적용하여 비디오 인페인팅 기능뿐만 아니라 분할 마스크를 사용한 비디오 및 다중 객체 추적 기능을 제공합니다.
최신 NVIDIA 그래픽 연구, 인공지능의 차세대 프론티어 ↗ NVIDIA는 SIGGRAPH 2023에서 약 20편의 연구 논문을 발표할 예정입니다. 이 연구는 텍스트를 개인화된 이미지로 변환하는 생성 AI 모델, 정지된 이미지를 3D 객체로 변환하는 역 렌더링 도구, 복잡한 3D 요소를 시뮬레이션하는 신경 물리학 모델, 실시간으로 생성하는 신경 렌더링 모델 등 생성 AI 및 신경 그래픽을 다룹니다, 인공지능으로 작동하는 시각적 세부 사항.
런웨이, 1세대 아이폰 앱 출시 ↗ AI 스타트업 런웨이가 자사의 비디오 대 비디오 생성 AI 모델 Gen-1을 특징으로 하는 iOS용 첫 모바일 앱을 출시했습니다. 사용자는 휴대폰에서 동영상을 녹화하여 몇 분 만에 AI 동영상으로 변환하거나 텍스트 프롬프트, 이미지 또는 스타일 사전 설정을 사용하여 기존 동영상을 변경할 수 있습니다.

Stanford researchers have shown that so-called "emergent abilities" in AI models, where a large model suddenly displays an ability it was not designed to possess, are really a "mirage" produced by researchers.

Many researchers and industry leaders, such as #Google CEO Sundar Pichai, have claimed large #language models like GPT-4 and Google's Bard can suddenly display knowledge that they weren’t programmed to know, something considered human-like #intelligence.

A 60-Minutes segment from April 16 claimed #AI models are "teaching themselves skills that they weren't expected to have," because they weren’t trained to have those skills. For instance, Google’s Bard was able to translate Bengali even though it was not trained to do so.

#Microsoft #researchers claimed OpenAI's GPT-4 language model showed “sparks of artificial general intelligence,” saying it could “solve novel and difficult tasks…without needing any special prompting.” Such exaggerated claims, i.e., #hype, “stoke fears of losing control of an AI that suddenly eclipses human intelligence.”

Stanford researchers present an explanation for emergent abilities. They write that “for a particular task and model family, when analyzing fixed model outputs, one can choose a metric which leads to the inference of an emergent ability or another metric which does not.”

A person’s choice of a "non-linear" or "discontinuous" measurement can result in what appear to be sharp and unpredictable changes that are then falsely labeled as emergent abilities when in reality the performance curve is increasing smoothly.

A discontinuous metric is something like a “Multiple Choice Grade,” which is the metric that produced the most supposed emergent abilities. Linear metrics, on the other hand, include things like “Token Edit Distance,” which measures the similarity between two tokens, and “Brier Score,” which measures the accuracy of a forecasted probability. What the researchers found was that when they changed the measurement of their outputs from a nonlinear to a linear metric, the model's progress appeared predictable and smooth, nixing the supposed "emergent" property of its abilities.

Imagine evaluating baseball players based on their ability to hit a baseball a certain distance,” the researchers said. “If we use a metric like ‘average distance,” the distribution of players' scores will likely appear smooth and continuous. However, if we use a discontinuous metric like ‘whether a player's average distance exceeds 325 feet,’ then many players will score 0, while only the best players will score 1. Both metrics are valid, but we shouldn’t be surprised when the latter metric yields a discontinuous outcome. #technology #innovation #hype #ethics #startups #artificialintelligence

AGI

÷🦎 Chameleon: Plug-and-Play Compositional Reasoning with GPT-4

Large language models (LLMs) have achieved remarkable progress in various natural language processing tasks with emergent abilities. However, they face inherent limitations, such as an inability to access up-to-date information, utilize external tools, or perform precise mathematical reasoning. In this paper, we introduce Chameleon, a plug-and-play compositional reasoning framework that augments LLMs to help address these challenges. Chameleon synthesizes programs to compose various tools, including LLM models, off-the-shelf vision models, web search engines, Python functions, and rule-based modules tailored to user interests. Built on top of an LLM as a natural language planner, Chameleon infers the appropriate sequence of tools to compose and execute in order to generate a final response. We showcase the adaptability and effectiveness of Chameleon on two tasks: ScienceQA and TabMWP. Notably, Chameleon with GPT-4 achieves an 86.54% accuracy on ScienceQA, significantly improving upon the best published few-shot model by 11.37%; using GPT-4 as the underlying LLM, Chameleon achieves a 17.8% increase over the state-of-the-art model, leading to a 98.78% overall accuracy on TabMWP. Further studies suggest that using GPT-4 as a planner exhibits more consistent and rational tool selection and is able to infer potential constraints given the instructions, compared to other LLMs like ChatGPT.
Generative Agents: Interactive Simulacra of Human Behavior

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.
Reflexion: an autonomous agent with dynamic memory and self-reflection

Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making processes, specifically the ability to learn from mistakes. Self-reflection allows humans to efficiently solve novel problems through a process of trial and error. Building on recent research, we propose Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment. To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.
Self-Refine: Iterative Refinement with Self-Feedback

Like people, LLMs do not always generate the best text for a given generation problem on their first try (e.g., summaries, answers, explanations). Just as people then refine their text, we introduce SELF-REFINE, a framework for similarly improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an output using an LLM, then allow the same model to provide multi-aspect feedback for its own output; finally, the same model refines its previously generated output given its own feedback. Unlike earlier work, our iterative refinement framework does not require supervised training data or reinforcement learning, and works with a single LLM. We experiment with 7 diverse tasks, ranging from review rewriting to math reasoning, demonstrating that our approach outperforms direct generation. In all tasks, outputs generated with SELF-REFINE are preferred by humans and by automated metrics over those generated directly with GPT-3.5 and GPT-4, improving on average by absolute 20% across tasks.
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace

Solving complicated AI tasks with different domains and modalities is a key step toward advanced artificial intelligence. While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a framework that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in Hugging Face, HuggingGPT is able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards advanced artificial intelligence.
Auto-GPT: An Autonomous GPT-4 Experiment

Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set. As one of the first examples of GPT-4 running fully autonomously, Auto-GPT pushes the boundaries of what is possible with AI.
LiOnConnect

사내의 제품 정보, 물류 정보, 인사규정, 회계기준과 같은 정보는 사내에 유지되어야 하며, 해당 사항에 대한 질의와 답변에 대해서도 비밀이 유지되어야 합니다. 기존 외부 클라우드에서 제공되는 언어모델의 경우 사내 정보가 유출될 가능성을 통제할 수 있는 기술적인 방법이 없으므로, 언어모델을 사내에 설치하여 사용하는 방법이 유일합니다. LiOn은 사내에 설치하여 사용할 수 있는 경량화된 초거대 언어모델로서 사내의 정보를 안전하게 유지하면서 구성원들이 안전하게 사용할 수 있는 대안을 제공할 수 있습니다. 아래는 그중 하나의 예시이며사내에서의 직원들과의 불화에 대한 상담에 있어 LiOn이 상담하는 사례를 보실 수 있습니다. 이 외에도 LiOn은 사내에서 일어날 수 있는 수많은 상황에서 다양한 해결방법을 제공함으로서 24/7 구성원들의 업무를 돕는 것이 가능합니다.

그날 분위기 보고 일부만 발표할 예정입니다

Research

Personal

제가 몇 개월 전 참여한 Machine Learning Reproducibility Challenge (MLRC) 2022에 제출한 페이퍼 (Reproducibility and Study of Behavior Transformers)가 Outstanding Paper Honorable Mention (Top 5)으로 accept되었습니다!

Ethics

Using the Veil of Ignorance to align AI systems with principles of justice

Paper: https://www.pnas.org/doi/10.1073/pnas.2213709120 Blog: https://www.deepmind.com/blog/how-can-we-build-human-values-into-ai

DeepMind에서 지금까지 대다수의 AI safety 논문에 비해 AI 윤리에 대해 철학적인 근거가 매우 Deep한 논문을 내어 공유해드립니다. 대다수의 AI safety는 법률을 위반하지 않는 등 heuristic한 규칙을 적용했으며 가장 큰 문제는 다양한 사람들의 다양한 가치관을 어떻게 취합할 것인지에 대해 의견이 일치하지 않는다는 점이었습니다. 이에 해당 논문에서는 John Rawls의 Veil of Ignorance를 적용한 방법론을 근거로 하여 새로운 패러다임을 제시합니다.

보다 더 구체적으로 Veil of Ignorance를 적용할 수 있는 사례를 직접 제시하지 않는다는 점이 아쉽지만 여러 실험에서 정치적 성향 및 위험 관리의 영향의 요소보다 공정성을 더 중시하는 것을 실험적으로 보여 추후 인공지능의 윤리에 보다 범용적으로 적용할 수 있다고 생각됩니다.

Theoretical

Hyperbolic Image-Text Representations

ArXiv: https://arxiv.org/abs/2304.09172

기존의 딥러닝 모델은 Euclidian space를 사용한 feature embedding을 나타내는 것이 주된 방식이었으나 해당 논문에서는 hyperbolic space에서 representation을 만들어 hierarchical embedding을 보다 정확하게 표기하는 방법을 제안합니다. 올해 ICLR에서 hyperbolic space에서 강화학습을 적용했을 때 성능이 향상되는 것을 발표한 논문이 이목을 끌었는데 hyperbolic space는 tree 구조를 연속적인 공간에서 나타낸 것으로 생각할 수 있는데 대부분 사물이 단순히 sparse 공간에서 있을 뿐만 아니라 소속 정보가 담겨 있다는 것을 감안한다면 추후 좋은 연구 방향이 될 것으로 생각됩니다.

Are Emergent Abilities of Large Language Models a Mirage?

ArXiv: https://arxiv.org/abs/2304.15004

기존 연구에서 모델의 크기에 따라 특정 크기에서 emergent ability라는 기존 더 작은 모델에서 확인되지 않는 능력이 발견되는 현상에 대한 논의가 많이 진행되었는데 본 연구에서는 그런 문제는 불연속적인 측정 metric으로 인한 것이며 metric을 수정할 경우에 더 이상 발생하지 않는다고 주장합니다. 또한, 기존 흔히 사용되는 Vision task에서 accuracy와 같은 연속적인 척도 대신 top k matching과 같은 불연속적인 척도를 사용할 경우 vision model에서도 emergent ability 현상이 발생하는 것을 보입니다.

Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning

ArXiv: https://arxiv.org/abs/2304.13850 GitHub: https://github.com/facebookresearch/DejaVu

SSL 방법론에서 diffusion model을 적용해 embedding에서 다시 학습 데이터의 상당 부분을 복원할 수 있고 이것은 단지 correlation을 학습한 것을 넘어 학습 데이터의 memorization을 보여주는 연구입니다. Segment Anything Model (SAM)과 같은 SSL 모델의 대중화로 인해 vision SSL의 중요성이 부각되면서 공개된 모델에 점검할 사항이라고 생각됩니다.

Practical

CCpdf: Building a High-Quality Corpus for Visually Rich Documents from Web Crawl Data

ArXiv: https://arxiv.org/abs/2304.14953 GitHub: https://github.com/applicaai/ccpdf

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

ArXiv: https://arxiv.org/abs/2305.02301

News

Google "We Have No Moat, And Neither Does OpenAI": Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI

Source: https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

구글의 연구원이 작성한 내부 문서가 유출되었는데 구글이 현재 OpenAI와의 경쟁보다 오픈소스와의 경쟁에 밀리는 것과 큰 모델을 고집하는 것이 오히려 발목을 잡고 있다고 주장합니다. 논란이 많지만 오픈소스의 빠른 진전 속도가 현재 proprietary system을 따라잡고 더 빠르게 응용되는 것은 DALL-E와 Stable Diffusion의 경쟁을 봐도 가능성이 있다고 생각됩니다.

RLHF: Reinforcement Learning from Human Feedback

Blog: https://huyenchip.com/2023/05/02/rlhf.html MLOps 저서로 유명하신 Chip Huyen님께서 RLHF를 정리하고 관련된 리소스를 공유한 블로그를 공개했습니다. RLHF에 필요한 강화학습 개념이 ~저를 포함한~ 많은 딥러닝 연구원에게 생소한 분야였는데 매우 정리가 잘 되어 있어 ChatGPT와 같은 거대 언어 모델을 이해하는데 필요한 배경 지식을 채우는데 활용할 수 있습니다.

Navigating the High Cost of AI Compute

Blog: https://a16z.com/2023/04/27/navigating-the-high-cost-of-ai-compute 유명한 VC 투자사인 Andreesen Horowitz에서 LLM의 학습 및 서비스화하기 위한 고려 요소 및 비용에 대한 블로그를 공개했습니다. LLM의 비용 구조 및 고려사항에 대해 링크 중 매우 유용한 자료가 여럿 포함되어 있는데 참조 시 많은 도움이 될 것 같습니다.

ChatGPT Prompt Engineering for Developers

Course: https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers GitHub: https://github.com/ralphcajipe/chatgpt-prompt-engineering

Andrew Ng 교수님께서 운영하시는 DeepLearning.AI에서 개발자를 위한 ChatGPT Prompting Course를 공개했습니다. 비연구직의 개발자에게 ChatGPT를 잘 사용하게 함으로써 LLM의 효용을 더 많은 사람에게 알리고 활용도를 높여 생산성 향상 ~인원 감축~을 도모할 수 있을 것이라고 생각됩니다. 현재 초기 홍보를 위해 무료이지만 조만간 유료로 전환할 것으로 보입니다.

Fine-Tuning OpenAI Language Models with Noisily Labeled Data Blog: https://www.kdnuggets.com/2023/04/finetuning-openai-language-models-noisily-labeled-data.html

ImageNet에서 중복된 class가 있다는 점 및 흔히 사용되는 benchmark dataset에서 label에 문제가 많으며 이러한 데이터가 모델 학습 및 성능 평가에 큰 타격을 미치는 것을 보여주신 Curtis Northcutt이 설립한 Cleanlab에서 politeness 관련 데이터셋에서 GPT-3.5 모델 fine-tuning에 data cleansing의 효과를 확인했을 때 정확한 라벨을 적용했을 경우 성능 향상이 매우 큰 것을 확인했습니다. 하나의 task에서만 실행되어 추가 검증이 여전히 필요하지만 task-specific fine-tuning에서 데이터의 질이 LLM에서도 중요한 이슈임을 확인할 수 있습니다.

FACT SHEET: Biden-⁠Harris Administration Announces New Actions to Promote Responsible AI Innovation that Protects Americans’ Rights and Safety

Link: https://www.whitehouse.gov/briefing-room/statements-releases/2023/05/04/fact-sheet-biden-harris-administration-announces-new-actions-to-promote-responsible-ai-innovation-that-protects-americans-rights-and-safety

White House announces an independent commitment from leading AI labs like Anthropic, Google, Hugging Face, Microsoft, NVIDIA, OpenAI, and Stability AI, to participate in a public evaluation of AI systems on an evaluation platform developed by Scale AI.

Technology

지난 2주일동안 딥러닝 관련 기술 스택에서 많은 이노베이션이 있어 공유해드립니다.

A Cookbook of Self-Supervised Learning

Blog: https://ai.facebook.com/blog/self-supervised-learning-practical-guide ArXiv: https://arxiv.org/abs/2304.12210

Mojo by Modular

Blog: https://docs.modular.com/mojo/why-mojo.html Documentation: https://docs.modular.com/mojo/programming-manual.html Fast.ai Blog: https://www.fast.ai/posts/2023-05-03-mojo-launch.html GitHub: https://github.com/modularml/mojo Hacker News Discussion: https://news.ycombinator.com/item?id=35790367

LLVM, Clang, Swift, MLIR 등 여러 컴파일러 및 프로그래밍 언어를 설계한 Chris Lattner의 스타트업 Modular AI에서 Python의 편리성에 low-level programming을 추가할 수 있는 새로운 프로그래밍 언어 Mojo를 발표했습니다. Python 언어는 딥러닝과 데이터 사이언스에서 많이 사용되지만 최상의 성능을 위해서는 CUDA, OpenCL, SYCL 등 low-level 프로그래밍 언어로 최적화를 진행해야 한다는 단점이 있습니다. Julia 프로그래밍 언어와 같이 단순하면서도 고성능인 언어도 존재하지만 기존 Python 라이브러리와 호환이 좋지 않다는 점과 performance cliff로 인해 성능 하락이 쉽다는 단점으로 인해 활용도는 미미한 편입니다. C++ 언어의 난해함으로 인해 Carbon과 같은 successor language approach도 있으나 진전 속도가 매우 느립니다. 실제로 ISO C++에서 수학적 연산 최대 성능을 내기 위해 컴파일러 최적화를 위한 난해한 코드를 많이 작성해야 하며 CUDA와 같은 유사하지만 별도의 언어를 배워야 한다는 단점이 있는데 기존의 파이썬과 호환되면서 가장 low-level한 최적화까지 모두 가능한 언어가 있다면 새로운 operator의 작성을 쉽게 만드는 등 HPC 및 딥러닝 연구의 발전 속도를 향상하는데 많은 도움이 될 것으로 생각됩니다. Rust에서 유래한 borrow checker 또한 구현 예정이어서 코드 정확성 및 퀄리티도 크게 향상될 것으로 생각됩니다.

Introducing Hidet: A Deep Learning Compiler for Efficient Model Serving

Paper: https://dl.acm.org/doi/10.1145/3575693.3575702 GitHub: https://github.com/hidet-org/hidet Blog: https://pytorch.org/blog/introducing-hidet Website: https://docs.hidet.org/stable/index.html

CentML에서 Hidet이라는 새로운 딥러닝 compiler를 발표하고 PyTorch 2.x의 torch.compile에 backend로 사용할 수 있도록 공개하였습니다. 현재 PyTorch 2.x에서는 torch.compile을 통해 OpenAI에서 개발한 Triton backend로 변환하고 Triton은 다시 MLIR을 통해 최적화 작업을 진행하는 방식을 사용합니다. 다만, (저자들에 의하면) Triton 등 방법론은 fine-grained optimization을 사용하기 어렵게 하기 때문에 먼저 operator를 최적화한 후 다시 하드웨어 최적화를 진행하는 새로운 컴파일러를 Python으로 (???!!!) 작성하여 공개했습니다. 아직 학습에서는 사용할 수 없지만 모델 배포를 하시는 분들께 많은 도움이 될 것 같습니다.

MLC LLM: Enable everyone to develop, optimize and deploy AI models natively on everyone's devices

Website: https://mlc.ai/mlc-llm/ GitHub: https://github.com/mlc-ai/mlc-llm Course: https://mlc.ai/index.html

초거대 LLM 모델을 웹브라우저 및 edge에서 inference할 수 있도록 최적화하는 컴파일러가 공개되었습니다. 현재로써는 LLM의 inference도 대부분 서버 기반으로 진행되는데 몇 년 내로 브라우저 및 edge 장비에서도 실행되어 프라이버시 및 개인화 요구 등을 만족할 수 있는 모델이 만들어질 것으로 기대됩니다.

Technical Blogs by Intel

딥러닝 모델의 배포가 중요해지면서 CPU에서의 성능도 중요성이 커졌기 때문에 Intel에서 최적화를 위한 방법론을 많이 공유하고 있습니다. 심지어 최신 Intel CPU에서는 AMX (Advanced Matrix Instructions)라는 새로운 instruction이 추가되어 Tensor Core와 유사한 기능을 하게 되었는데 CPU에서 딥러닝 모델 배포 최적화 관련 최근 블로그 몇 개를 소개합니다.

Optimizing Transformer Model Inference on Intel® Processors

https://www.intel.com/content/www/us/en/developer/articles/technical/optimize-transformer-model-inference-processors.html

Intel PyTorch Extension을 사용했을 때 CPU에서 BERT 모델을 처리하기 위한 최적화 및 MKL을 사용하는 방법에 대해 설명합니다.

Introduction to Distributed Communication

https://community.intel.com/t5/Blogs/Tech-Innovation/Tools/Introduction-to-Distributed-Communication/post/1476036

Moore의 법칙의 종말로 하나의 프로세서가 더 좋아지는 것이 아닌 여러 프로세서에 업무를 나누어 프로그램의 속도를 향상해야 하지만 그러기 위한 분산 연산처리는 난이도가 매우 높습니다. 해당 분야에 대한 좋은 소개 글이 공개되어 공유드립니다.

News

Conferences
- NeurIPS 2023 Abstract 데드라인: 5월 12일 새벽 5시 (논문 제출 마감 5월 18일 새벽 5시)
- ACL 2023 결과 발표: 저자분들 모두들 축하드립니다.
- ICLR 2023: 키갈리 모두 즐거우셨나요? 건강하게 다녀오셨기를!
- NeurIPS 2023 main track, D&B track 리뷰어 계속 모집중 입니다~
Google의 AI 논문 공개 정책 변경: 훨씬 보수적으로..
미국 글로벌 빅테크 AI 신뢰성 공개 평가
GPT4-32k-API 활용 가능
AI미래포럼 산업AI 웨비나: Industrical AI 우리가 한다
- 5월 9일, 10:00 - 11:30, 유투브 생중계
- LG이노텍, HD현대사이트솔루션, 포스코

ArXiv

Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs
- LLM Inference API 형태 서비스의 실제 효율성 비교를 공정하게 하기 위한 metric 제안 (from MSR, Stanford Univ.)
- 요즘 매우 다양한 LLM들이 LMaaS 형태로 제공 되고 있음.
- Blackbox API라 인퍼런스 하드웨어, 내부 최적화 등에 따라 prompt 생성타임이 들쭉 날쭉함.
- 표준 하드웨어어, 모델 구조등을 맞춘 idealized runtime 이라는 메트릭을 제시
- 비용 고려 metric도 함께 제시. 그래서 efficiency-capability tradeoff 함께 제안
- 대상은 Open AI Davinci, Anthropic, AI21 Labs, Cohere, GPT-J, Bloom 등 10여개 모델에 대해 비교 평가
- Introduction 마지막에 OpenAI davinci의 효율은 모델 자체보다는 다른 최적화의 이득을 본다고..
AutoML-GPT: Automatic Machine Learning with GPT
- GPT를 활용해서 다양한 태스크에 대한 AutoML 구현 (from UT Austin)

간단하게 뉴스 위주로 공유합니다. (주로 TechCrunch에 나온 기사들)

LLM 서비스 관련

OpenAI previews business plan for ChatGPT, launches new privacy controls : 4/25 OpenAI의 블로그에 나온 소식입니다. 유저데이터에 대한 컨트롤 기능을 강화한 기업용 ChatGPT Business가 나온다는 예고인데요, 아마도 유저데이터의 접근이나 학습, 프라이버시 컨트롤 등을 용이하게 하지 않을까 싶습니다 (이에 덧붙여 30일 이후 챗 히스토리 삭제 기능과 대화 export 기능 추가 소식도 알렸네요)

테리 생각: MS Azure에서도 API를 사용할 수 있는데, MS Azure와 OpenAI는 어떻게 다른 비즈니스모델을 가져갈지 궁금하네요. (어차피 서비스에 클라우드를 써야한다면 Azure를 쓰는게 더 편리하지않나 싶습니다.)
Hugging Face releases its own version of ChatGPT : Hugging Face가 자신들만의 LLM 서비스, HuggingChat을 내놓았습니다.

테리 생각: 아마존도 자체 LLM인 Bedrock을 만들고, 네이버도 하이퍼클로바를 만들듯, 큰 기업이 미래 어느 기업의 AI 종속기업이 되지 않으려면 자체LLM을 만드는 걸 피할 수는 없는 것 같습니다. 이 소식이 몇달 전 AWS와 Hugging Face의 파트너십 소식과 연관이 있는지는 모르겠습니다만, 앞으로 몇몇만 살아남을 LLM의 미래까지의 합종연횡이 예상되네요.
Microsoft doubles down on AI with new Bing features : 새로운 Bing과 Bing을 통합한 브라우저 Edge를 내놓았단 소식입니다. 가장 강조된 부분은 visual answering입니다. https://www.youtube.com/watch?v=tRWYOAMZJf8
Microsoft makes its AI-powered Designer tool available in preview : MS에서 DALL-E2를 기반으로 한 디자인툴, Designer를 내놓았습니다. Designer는 (마치 Powerpoint AI처럼) 슬라이드나 포스터, 포스트카드 등을 간편하게 만들어 준다고 하네요

테리 생각: 이제까지는 text, image, speech 등이 모두 따로 존재했지만, 이제 곧 multi-modal의 시대가 올 것입니다. 이에 발맞춰 그동안 text 검색 시장에서 1위를 지켰던 구글의 아성을 무너뜨리려는 여러 변화가 생기겠죠. LLM 뿐만 아니라 시각정보를 잘 이해하고 음성도 잘 인식하는 것이 필요한데요, 그런 의미에서 이들 모두를 가장 잘 다루는 곳이 어디냐라는 경쟁이 생기고 있습니다. 팝콘각이네요.
Nextdoor launches new ‘Assistant’ feature powered by OpenAI’s ChatGPT : 이웃 간 소셜네트워크인 Nextdoor에 ChatGPT 기반 글 작성 도움 기능이 추가됐다고 하네요.
LinkedIn expands its generative AI assistant to recruitment ads and writing profiles : 링크드인도 AI가 글 작성(포스팅, 채용공고, 프로필 작성)을 도와주는 AI를 공개했다고 하네요

테리 생각: code writing(=coding)에서 copilot이 활성화 되었듯, 자연어AI인 GPT가 natural language writing의 copilot이 되지 못할 이유가 없습니다. AI가 강화되면 될수록 originality가 있는 소수만 떼돈을 벌고 평범한 노동자들은(e.g. coder, writer)은 일자리를 잃게 되는 미래가 몇년 안에 오겠네요.
"전문가 79% “챗GPT 답변이 의사보다 낫다”, 한국일보 : JAMA에 올라온 UCSD의 연구결과입니다. 약 200개의 질문에 대해 3명의 의료전문가에게 "의사 vs 인간" 중 어느 답변이 더 나은지 블라인드 테스트 했더니 79%의 질문에서 GPT의 응답이 더 낫다고 답변했다고 합니다. [UCSD의 영문 블로그 원문]

테리 의견: 텍스트의 Q&A에선 (마치 구글이 인간보다 낫듯) GPT가 의사보다 더 나을 수 있을 것입니다. 하지만 실제 진료는 사람의 상태를 여러 비언어적인 증거로 보아야 하므로, GPT가 의사보다 나으려면 이러한 '비언어적인 증거'를 캡쳐하고 해석하는 부분이 강화되어야 하겠습니다.
Chegg shares drop more than 40% after company says ChatGPT is killing its business : 미국의 교육 스타트업 Chegg의 주가가 40% 이상 폭락했습니다. Chegg는 대학교제의 연습문제 풀이DB를 제공하는 서비스인데요, 그 퀄리티가 ChatGPT에 비해 떨어진다는 사실이 퍼지면서 결국 교육비즈니스는 ChatGPT가 모두 말살하는 것이 아니냔 우려가 생기고 있습니다.

기타 AI서비스 관련

Apple is reportedly developing an AI-powered health coaching service : 애플에서 유저들에게 운동을 독려하고 식습관을 개선하며 더 좋은 수면질을 유도하는 AI코칭서비스를 만들고 있다고 합니다. 이는 6월 WWDC에서 공개될 것으로 예측하고 있네요.
TikTok is testing an in-app tool that creates generative AI avatars : 틱톡에서 자신을 닮은 아바타를 AI로 생성하는 기능을 추가할 예정이라고 합니다. 트위터 영상을 참고하세요 https://twitter.com/i/status/1651012552376156166
Tinder’s verification process will now use AI and video selfies : 이제는 틴더에서 프로필 사진이 진짜임을 증명할 때 동영상을 AI가 판독해서 증명한다고 합니다.
Simpplr raises $70M for its AI-powered intranet platform : 인트라넷을 위한 사내 소셜네트워크 서비스 Simpplr가 약 90억원의 투자를 유치했다는 소식입니다. "AI는 Simpplr의 핵심역량"이라고 말했다는데요, 글 작성을 도와줄 뿐만 아니라 직원들의 감정 분석에도 쓰인다고 하네요.

그리고 덧붙여...

SkinChat.ai : 지난주 하정우 소장님이 "GPT는 recommender로 쓸만한가?"라는 논문을 소개해 주셨고, 최근 Visual을 강조한 Bing의 변화라든지, AI닥터의 출현과 같은 이슈들이 있는데요, 실제로 GPT + Vision + Recommender + Medical 분야의 강자가 있습니다. ~(광고타임)~ ART Lab의 피부상담AI, SkinChat이 바로 그것인데요 여러 도메인에서 이 같은 AI활용 서비스가 나오길 기대합니다. [SkinChat 성과 관련 테리의 페북 글]

저는 오늘 모두연의 LAB 과 풀잎스쿨 홍보를 잠깐 하겠습니다.

Medical AI LAB : https://modulabs.co.kr/apply_lab/

작년 11월 경 부터 시작한 랩인데요, 의료 인공지능 관련 논문을 쓰는 모임으로 만들어서 운영하고 있습니다. 장기적으로 공부하고 운영하는 랩으로 기획하다 보니 모임이 느리게 흘러가는 경향이 있었는데, 이번 기수부터 좀 더 속도감 있게 활동하려고 여러 준비를 하고 있구요. 함께 연구를 할 주제가 있는 분들로 모집하고자 합니다. 현재는 DB로 공개된 데이터를 활용해서 연구를 진행하고 있습니다.
밀린 책 처치하기 클럽 풀잎스쿨 (https://modulabs.co.kr/apply-flip/) 제가 운영하고 있는 풀잎 스쿨인데, 현재 2기를 모집 중입니다 5/15 부터 시작할 예정입니다. 다들 책을 사놓고 끝까지 끝내기가 쉽지 않고, 저 역시 선물받거나 예전에 사둔 책들이 많았는데, 좀 더 강제성을 가지기 위해 풀잎을 열어서 운영하고 있습니다. 겸사겸사 네트워킹의 장도 되서 좋은 것 같아요. 1기가 성공적으로 끝나서 2기를 하고 있고, 현재까지는 Medical 내용을 다루지는 않고 있습니다. 기초/일반적인 내용의 책들로 진행하고 있구요. 이번 기수는 좀 빡빡하게 달려서 3권까지 해보려고 하고있습니다.

jungwoo-ha / WeeklyArxivTalk

[20230507] Weekly AI ArXiv 만담 시즌2 - 16회차 #82

Title: Developing a Safe and Ethical AI Content Generation System: Filtering and Evaluation Criteria for Textual Outputs

Abstract:

News

AGI

Research

Personal

Ethics

Using the Veil of Ignorance to align AI systems with principles of justice

Theoretical

Hyperbolic Image-Text Representations

Are Emergent Abilities of Large Language Models a Mirage?

Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning

Practical

CCpdf: Building a High-Quality Corpus for Visually Rich Documents from Web Crawl Data

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

News

Google "We Have No Moat, And Neither Does OpenAI": Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI

RLHF: Reinforcement Learning from Human Feedback

Navigating the High Cost of AI Compute

ChatGPT Prompt Engineering for Developers

FACT SHEET: Biden-⁠Harris Administration Announces New Actions to Promote Responsible AI Innovation that Protects Americans’ Rights and Safety

Technology

A Cookbook of Self-Supervised Learning

Mojo by Modular

Introducing Hidet: A Deep Learning Compiler for Efficient Model Serving

MLC LLM: Enable everyone to develop, optimize and deploy AI models natively on everyone's devices

Technical Blogs by Intel

Optimizing Transformer Model Inference on Intel® Processors

Introduction to Distributed Communication

News

ArXiv

LLM 서비스 관련

기타 AI서비스 관련

그리고 덧붙여...