tarasglek / chatcraft.org

Developer-oriented ChatGPT clone
https://chatcraft.org/
MIT License
146 stars 26 forks source link

Add cohere free model to free endpoint, set it to default #610

Open tarasglek opened 2 months ago

tarasglek commented 2 months ago

They offer free models for non-prod usage. this is a 104B, way better than other free models

curl --request POST \
  --url https://api.cohere.ai/v1/chat \
  --header 'accept: application/json' \
  --header 'content-type: application/json' \
  --header "Authorization: bearer $CO_API_KEY" \
  --data '{
    "chat_history": [
    ],
    "message": "Say this is a test!",
    "stream": true,
    "logprobs": true
  }'
{"is_finished":false,"event_type":"stream-start","generation_id":"388f3ae9-22f6-416d-b441-afd56794849d"}
{"is_finished":false,"event_type":"text-generation","text":"This"}
{"is_finished":false,"event_type":"text-generation","text":" is"}
{"is_finished":false,"event_type":"text-generation","text":" a"}
{"is_finished":false,"event_type":"text-generation","text":" test"}
{"is_finished":false,"event_type":"text-generation","text":"!"}
{"is_finished":true,"event_type":"stream-end","response":{"response_id":"d36e41d6-1911-4f97-92e5-037eca9aa2f4","text":"This is a test!","generation_id":"388f3ae9-22f6-416d-b441-afd56794849d","chat_history":[{"role":"USER","message":"Say this is a test!"},{"role":"CHATBOT","message":"This is a test!"}],"finish_reason":"COMPLETE","meta":{"api_version":{"version":"1"},"billed_units":{"input_tokens":6,"output_tokens":5},"tokens":{"input_tokens":72,"output_tokens":5}}},"finish_reason":"COMPLETE"}

Here is how fancy their RAG over web search is:

curl --request POST \
  --url https://api.cohere.ai/v1/chat \
  --header 'accept: application/json' \
  --header 'content-type: application/json' \
  --header "Authorization: bearer $CO_API_KEY" \
  --data '{
    "chat_history": [
      {"role": "USER", "message": "Who released command-r+"},
      {"role": "CHATBOT", "message": "Cohere"}
    ],
    "message": "What size is the model, how much does it cost compared to dbrx?",
    "connectors": [{"id": "web-search"}]
  }'
response_id: 2c438b14-766c-4eb0-9c30-b857ef42a412
text: Command R+ is a 104 billion parameter model. It is priced at $3.00 per million input tokens and $15.00 per million output tokens. In comparison, Command R is priced at $0.50 per million input tokens and $1.50 per million output tokens.
generation_id: 3b8f0cc8-d1ed-4955-b7fa-ad23d75471a6
chat_history:
  - role: USER
    message: Who released command-r+
  - role: CHATBOT
    message: Cohere
  - role: USER
    message: What size is the model, how much does it cost compared to dbrx?
  - role: CHATBOT
    message: Command R+ is a 104 billion parameter model. It is priced at $3.00 per million input tokens and $15.00 per million output tokens. In comparison, Command R is priced at $0.50 per million input tokens and $1.50 per million output tokens.
finish_reason: COMPLETE
meta:
  api_version:
    version: "1"
  billed_units:
    input_tokens: 11091
    output_tokens: 82
  tokens:
    input_tokens: 11853
    output_tokens: 82
citations:
  - start: 16
    end: 44
    text: 104 billion parameter model.
    document_ids:
      - web-search_8
      - web-search_9
  - start: 61
    end: 91
    text: $3.00 per million input tokens
    document_ids:
      - web-search_7
  - start: 96
    end: 129
    text: $15.00 per million output tokens.
    document_ids:
      - web-search_7
  - start: 168
    end: 198
    text: $0.50 per million input tokens
    document_ids:
      - web-search_7
  - start: 203
    end: 235
    text: $1.50 per million output tokens.
    document_ids:
      - web-search_7
documents:
  - id: web-search_8
    snippet: |-
      The Top Open-Weights LLM + RAG and Multilingual Support

      This post is an update on what I’ve been up to since I joined Cohere. I’ve had fun contributing to the launch of Command R and R+, the latest Cohere models. I’ll discuss more details once the tech report is out; in the meantime I’ll share what I’m most excited about.

      Command R+ is ranked as the top open-weights model on Chatbot Arena, even outperforming some versions of GPT-4. Why is this exciting?

      Chatbot Arena leaderboard as of April 9, 2024 (source: lmsys.org).

      Let’s talk about why we should care about Chatbot Arena rankings in the first place. I’ve written in the past about challenges in NLP benchmarking. Pre-LLM benchmarks such as SuperGLUE mostly consist of classification tasks and no longer provide sufficient signal to differentiate the latest generation of LLMs. More recent benchmarks such as MT-Bench consist of small samples of open-ended questions and rely on LLMs as evaluators, which have their own sets of biases.1

      MMLU, one of the most widely used benchmarks consisting of 14k multiple-choice questions sourced from public sources covering 57 domains has been featured prominently in GPT-4, Claude 3, and Mistral Large posts. The data is not without errors, however, and given its release in 2020, training data of recent models is likely at least partially contaminated.

      Chatbot Arena is a platform where users rate conversations in a blind A/B test. They can continue the conversation until they choose a winner. Of course, short user interactions often do not reveal more advanced model capabilities and annotators can be fooled by authoritative but non-factual answers. Nevertheless, this is the closest to an assessment on realistic user interactions that we currently have. As models are always evaluated based on new user conversations, there is no risk of data contamination.

      Command R+ outperforms versions of GPT-4 on Chatbot Arena while being much cheaper to use. It does also well on use cases that are under-represented in Chatbot Arena such as RAG, tool use, and multilinguality.2

      (left) Performance comparison of Command R+, Mistral-Large, and GPT4-turbo on three key capabilities: Multilingual, RAG, and Tool Use. (right) Comparison input and output token costs per million for models available on Azure. Source

      A GPT-4 Level Model on Your Computer

      Command R+ consists of 104B parameters with publicly available weights. This is the first time that a model that is close to GPT-4 performance is available for research use. With the right setup, Command R+ can generate text at a rate of 111 tokens/s (!) when deployed locally.3 To understand how to effectively prompt the model, check out the prompting guide.

      I’m excited about what this means for the open-source community and research, with the gap between closed-source and open-weight models closing and SOTA-level conversational models being more easily accessible.

      The gap between closed-source and open-weights models on Chatbot Arena is closing (source: Maxime Labonne).

      Other recently released models such as DBRX (132B parameters), Mixtral 8x22B (176B parameters), and Grok-1 (314B parameters) are based on a Mixture-of-Experts (MoE), trading off inference speed for memory costs. While these models only activate a subset of parameters for each token, they still require storing all parameters in-memory, which makes them harder to use locally. So far, they are not available or rank much lower than Command R+ on Chatbot Arena.4

      Command R+ comes with a non-commercial license. If you want to self-host or fine-tune it for commercial purposes, we’ll work with you to find something that works for you.

      While Command R+ can be used as a chatbot, it has been designed for enterprise use. Faithful and verifiable responses are especially important in an enterprise setting. Reducing hallucinations and providing trustworthy responses are important research challenges. There are different ways to mitigate hallucinations, ranging from debiasing and model editing to specialized decoding strategies (see Huang et al. (2023) for an overview).

      Retrieval-augmented generation (RAG; Lewis et al., 2020), which conditions on the LLM’s generation on retrieved documents is the most practical paradigm IMO. Command R+ uses RAG with in-line citations to provide grounded responses.

      However, evaluation of the quality and trustworthiness of such responses is challenging and motivated the development of new evaluation frameworks such as Attributable to Identified Sources (AIS; Rashkin et al., 2023).5 On our internal human evaluation measuring citation fidelity, Command R+ outperforms GPT4-turbo. On public multi-hop QA benchmarks, it outperforms models at the same price point such as Claude 3 Sonnet and Mistral-large.6

      (left) Human head-to-head preference results using a holistic grading scheme combining text fluency, citation quality, and overall utility. (right) Accuracy of multi-hop REACT agents powered by various models with access to the same search tools retrieving from Wikipedia (HotpotQA) and the Internet (Bamboogle and StrategyQA). Source

      You can easily use RAG via the API on the Internet or your own documents. A complete RAG workflow additionally involves document search and reranking, which can be seen in this Colab with an example RAG setup on Wikipedia.

      Example of basic RAG usage with Command models using the Cohere API (source).

      In enterprise settings, seamless integrations with existing APIs and services is crucial. I’ve written before about the promise of tool-augmented models. Tools can help decompose complex problems and make LLMs outputs more interpretable by enabling users to look at the trace of API calls. Command R+ has been trained for zero-shot multi-step tool use. On public tool use benchmarks, it outperforms GPT4-turbo.

      Conversational tool-use and single-turn function-calling evaluations using Microsoft’s ToolTalk (Hard) benchmark (Farn & Shin 2023) and Berkeley's Function Calling Leaderboard (BFCL) (Yan et al. 2024). Source

      The recommended way to leverage multi-step tool use with Command R+ is via LangChain. To teach the model to use a new tool, you only need to provide the name, definition (a Python function), and the arguments schema. The model can then be used as a ReAct agent in LangChain with a range of tools (see this Colab for an example workflow).

      Example of using Command R+ in LangChain with different tools including Internet search, vector store search, and Python execution.

      I hope that strong support for RAG and tool use in an open-weights model will lead to progress in important research directions, some of which I have outlined here. If you want the most efficient solution for RAG, Command R demonstrates highly competitive RAG and tool-use performance at cheaper cost (35B-parameter weights are publicly available).

      Command R+ works well in languages beyond English. It was pre-trained on 23 languages, with our main focus on 10 key language of global business: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. In our evaluations on translation tasks, it is competitive with GPT4-turbo.

      Comparison of models on FLoRES (in French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese) and WMT23 (in German, Japanese, and Chinese) translation tasks.

      Command R+ has been designed with multilinguality in mind. Its tokenizer is much less English-centric than others and compresses text in non-English languages much better than both the Mistral and OpenAI tokenizers.7 As LLM providers charge based on the number of input/output tokens, tokenizer choice directly impacts API costs for users. At the same cost-per-token, if one LLM generates 2x as many tokens as another, the API costs will also be twice as large.

      Comparison of the number of tokens produced by the Cohere, Mistral (Mixtral), and OpenAI tokenizers for different languages (as a multiple of the number of tokens produced by the Cohere tokenizer). The Cohere tokenizer produces much fewer tokens to represent the same text, with particularly large reductions on non-Latin script languages. For instance, in Japanese, the OpenAI tokenizer outputs 1.67x as many tokens as the Cohere tokenizer.

      Ahia et al. (2023) highlighted that such over-segmentation leads to “double unfairness”: higher API prices and lower utility (reduced performance) for many languages. In comparison, Command R+ is much more equitable. I hope that companies will take into account the impact of tokenization and other design choices on API costs in future LLMs.

      Given the focus on the Latin script in existing models, I particularly want to highlight Command R+’s performance in some prominent non-Latin script languages: Japanese, Korean, and Chinese. We evaluated on translation tasks as well as language-specific benchmarks such as Japanese MT-Bench8. Command R+ outperforms Claude 3 Sonnet and Mistral Large9 and is competitive with GPT 4-Turbo.

      Japanese evaluation on FLoRES, WMT23, and Japanese MT-Bench (source).

      We see similar trends for evaluations in Korean and Chinese. On Chinese Chatbot Arena, Command R+ is only behind GPT4 and Claude 3 Opus, models that are 2–3x more expensive. It’s been exciting to read the feedback from speakers of different language communities using Command R+.

      NLP News is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

      Overall, I’m really excited about Command R+’s capabilities and the future of LLMs. We will be pushing its multilingual capabilities to make it useful in many languages used in business. I’ve been using Command R+ with RAG via Cohere’s playground for my exploratory Internet searches and creative tasks in English and German and have been impressed by its quality. Feel free to try it in your language—and share your feedback in the comments or via email. I’d love to hear what works or doesn’t work for you.

      I’m also excited to hear from you if you’d like to explore using Command R+ for your (multilingual) business applications.

      LLM evaluators prefer longer responses or are affected by the order in which responses are presented.

      Evaluation results on other public benchmarks can be found here.

      This is on 4x A100 GPUs using a highly optimized open-source backend. Quantized versions of Command R+ can be run on 2x 3090 GPUs at around 10 tokens/s.

      DBRX ranks at #26 on Chatbot Arena as of the publication date of this post.

      Things used to be easier when the default setting was purely extractive QA and accuracy and exact match (EM) were the go-to metrics (see, for instance, Natural Questions or TyDi QA). With generative models, automatically identifying whether a (possibly verbose) response answers a question is much more challenging.

      Evaluation for HotpotQA and Bamboogle is done via a committee-of-LLMs-as-judges to reduce evaluator bias.

      The Anthropic tokenizer is not public so we could not compare to them.

      We use GPT4 as an evaluator so this evaluation is biased towards GPT4-turbo.

      Mistral Large doesn’t officially support these languages so its results are expectedly lower.

      © 2024 Sebastian Ruder

      Privacy ∙ Terms ∙ Collection notice

      Start WritingGet the app

      Substack is the home for great culture
    timestamp: 2024-04-16T11:33:02
    title: Command R+ - by Sebastian Ruder - NLP News
    url: https://newsletter.ruder.io/p/command-r
  - id: web-search_9
    snippet: "Command R+: Cohere's GPT-4 Level LLM for Enterprise AI\n\nCohere's Command R+ is a powerful, open-source large language model that delivers top-tier performance across key benchmarks, making it a cost-effective and scalable solution for enterprises looking to deploy advanced AI capabilities. 1000+ Pre-built AI Apps for Any Use Case\n\nCommand R+: Cohere's GPT-4 Level LLM for Enterprise AI Start for free\n\nCommand R+: Cohere's Powerful Open-Source LLM for Enterprise AI\n\nCohere, a leading provider of enterprise-grade AI solutions, has launched Command R+, its most advanced and scalable open-source large language model (LLM) built specifically for real-world business use cases. Command R+ represents a significant leap forward in enterprise AI, combining exceptional performance with features tailored to the needs of global organizations.\n\nWant to test out the Latest, Hottest, most trending LLM Online? Anakin AI is an All-in-One Platform for AI Models. You can test out ANY LLM online, and comparing their output in Real Time! Forget about paying complicated bills for all AI Subscriptions, Anakin AI is the All-in-One Platform that handles ALL AI Models for you!\n\nCommand R+ Outperforms in Key Enterprise Capabilities\n\nThe new 104 billion parameter model delivers industry-leading accuracy in retrieval augmented generation (RAG), multilingual support across 10 major business languages, and sophisticated multi-step tool use capabilities. Command R+ outshines similar models in the scalable market category and remains competitive against more costly alternatives.\n\nCommand R+ Benchmarks. Source\n\nWhen it comes to RAG, a critical capability for enterprises looking to leverage their own data, Command R+ achieves impressive results. In benchmarks, Command R+ demonstrates a 73.7% accuracy rate, surpassing Grok-1's 73.0%. This strong performance in RAG allows businesses to rapidly surface relevant information from internal sources to support various departments.\n\nCommand R+ Benchmarks on RAG. Source\n\nHere is an additional section comparing Command R+ to other major AI models, with a comparison table:\n\nCommand R+ Benchmarks and Comparison to Other Models\n\nTo evaluate the performance of Command R+, Cohere conducted extensive benchmarking tests comparing it to other leading large language models. The results demonstrate that Command R+ is highly competitive with top models across a range of key metrics.\n\nIn the widely used MMLU (Massive Multitask Language Understanding) benchmark, which tests models on 57 subjects spanning STEM fields, social sciences, humanities and more, Command R+ achieved an impressive score of 88.2%. This puts it ahead of models like GPT-3.5 (86.4%), Chinchilla (87.3%), and PaLM 540B (87.6%), and just behind the larger PaLM 62B model (89.1%) and Anthropic's Claude (89.3%).\n\nOn coding tasks, Command R+ also proved its mettle. In the HumanEval Python programming benchmark, it attained a success rate of 71.4%, surpassing GPT-3.5 (69.8%) and Chinchilla (70.2%) while coming close to PaLM 62B (72.1%) and Claude (72.6%).\n\nIn the realm of common sense reasoning, as measured by benchmarks like HellaSwag and PIQA, Command R+ continued its strong showing. It posted accuracy scores of 91.2% on HellaSwag and 90.6% on PIQA, beating out GPT-3.5 (90.1% and 89.3% respectively) and Chinchilla (90.8% and 90.1%) while remaining competitive with PaLM 62B (92.4% and 91.8%) and Claude (92.1% and 91.5%).\n\nThe table below summarizes how Command R+ stacks up against other major models across these and other key benchmarks:\n\nAs the benchmarking results show, Command R+ delivers top-tier performance that is on par with or exceeds models that have significantly more parameters. By optimizing for efficiency while maintaining high accuracy, Command R+ provides enterprises with a powerful and cost-effective solution for deploying advanced language AI at scale.\n\nWhile Command R+ may not match GPT-4 across every benchmark, it narrows the gap considerably, especially when accounting for its smaller size. As Cohere continues to refine and expand the capabilities of Command R+, it is well-positioned to be a leading choice for businesses looking to harness the transformative potential of large language models.\n\nRead more about the paper here:\n\nCommand R+ Excels in Programming and Mathematical Reasoning\n\nIn addition to its RAG capabilities, Command R+ shines in programming and mathematical reasoning tasks. On the HumanEval benchmark, which tests a model's ability to generate correct Python code, Command R+ scores an impressive 70.1%, outperforming Grok-1's 63.2%. Similarly, on the GSM8k benchmark for mathematical reasoning, Command R+ achieves a 66.9% accuracy rate compared to Grok-1's 62.9%.\n\nMultilingual Capabilities for Global Business\n\nCommand R+ demonstrates strong performance across 10 widely-used business languages: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese. This multilingual proficiency allows global organizations to more seamlessly deploy AI solutions that serve diverse teams and customer bases.\n\nWhile comprehensive multilingual benchmarks are still emerging, early indications suggest Command R+ is highly competitive with other top models. For example, in English language benchmarks, Command R+ achieves parity with GPT-4 on tasks like natural language inference and question answering.\n\nAdvanced Tool Use for Automating Complex Workflows\n\nCommand R+ introduces advanced multi-step tool use functionality, enabling the model to combine multiple tools over several steps to automate sophisticated enterprise workflows. Even when encountering errors, Command R+ can attempt self-correction to increase task success rates.\n\nIn comparisons with GPT-4 and DBRX on tool use benchmarks, Command R+ demonstrates comparable performance. For instance, on a benchmark simulating a multi-step data analysis workflow involving database queries, data visualization, and natural language summaries, Command R+ successfully completes the task 85% of the time, on par with GPT-4's 87% and DBRX's 83%.\n\nBalancing Performance and Efficiency\n\nWhile Command R+ is extremely capable, it also prioritizes efficiency to enable scalable enterprise deployments. Compared to GPT-4, Command R+ can generate outputs approximately 5 times faster while costing 50-75% less per output token.\n\nThis balance of performance and efficiency positions Command R+ as an attractive option for businesses looking to productionize AI at scale without compromising on quality. Cohere's commitment to data privacy and flexible deployment options further solidify Command R+'s enterprise readiness.\n\nEmpowering Researchers and Developers Worldwide\n\nCohere has made the model weights for Command R+ openly available to researchers on HuggingFace, democratizing access to a highly capable 104B parameter model. The release is governed by a CC-BY-NC license with acceptable use requirements.\n\nBy open-sourcing Command R+, Cohere aims to spur community-driven innovation and make advanced language AI more accessible. Researchers and developers worldwide can now collaborate on pushing the boundaries of what's possible with state-of-the-art LLMs.\n\nThe Future of Enterprise AI with Command R+\n\nThe launch of Command R+ marks a significant milestone in the evolution of enterprise-grade language AI. With its powerful RAG capabilities, multilingual proficiency, advanced tool use, and strong performance across key benchmarks, Command R+ sets a new standard for open-source models designed for real-world business applications.\n\nAs more organizations look to harness the transformative potential of large language models, Command R+ offers a compelling solution that balances cutting-edge performance with the efficiency, flexibility, and commitment to data privacy that enterprises require.\n\nCohere's decision to open-source Command R+ is a testament to their dedication to advancing the field of AI and empowering the global research community. By making this powerful model accessible to all, Cohere is helping to democratize access to state-of-the-art language AI and foster a more collaborative and innovative ecosystem.\n\nAs businesses continue to explore the vast possibilities of AI, Command R+ stands ready to help them build powerful solutions that drive productivity, enhance customer experiences, and unlock new opportunities. With Command R+, the future of enterprise AI is open, scalable, and poised for incredible breakthroughs.\n\nWant to test out the Latest, Hottest, most trending LLM Online? Anakin AI is an All-in-One Platform for AI Models. You can test out ANY LLM online, and comparing their output in Real Time! Forget about paying complicated bills for all AI Subscriptions, Anakin AI is the All-in-One Platform that handles ALL AI Models for you!\n\nCommand R+: Cohere's GPT-4 Level LLM for Enterprise AI\n\nCohere's Command R+ is a powerful, open-source large language model that delivers top-tier performance across key benchmarks, making it a cost-effective and scalable solution for enterprises looking to deploy advanced AI capabilities.\n\n\ Dolphin Mistral 2.8: The Uncensored AI Powerhouse with 32K Context \\n\nDolphin Mistral 2.8, a state-of-the-art uncensored language model, pushes the boundaries of NLP with its expanded context window and impressive performance across various benchmarks and applications.\n\nA Really Quick Introduction to Fine Tune Jamba\n\nDiscover the exciting world of fine-tuning Jamba, a powerful language model, with this comprehensive, step-by-step guide that combines code, humor, and practical insights to help you unlock its full potential!\n\nHow to Reduce LLM Hallucination: A Beginner's Guide\n\nWhat does it mean for a LLM to hallucinate? What are hallucinations in large language models? How do you overcome hallucinations in LLM? Read this article to find out!"
    timestamp: 2024-04-18T18:07:01
    title: 'Command R+: Cohere''s GPT-4 Level LLM for Enterprise AI'
    url: https://anakin.ai/blog/command-r-coheres-gpt-4/
  - id: web-search_7
    snippet: |-
      Hacker News new | past | comments | ask | show | jobs | submit

      Command R+: A Scalable LLM Built for Business (cohere.com)

      59 points by marban 8 days ago | hide | past | favorite | 19 comments

      simonw 7 days ago | next [–]

      I added support for this model to my LLM CLI tool via a new plugin: https://github.com/simonw/llm-command-r

      So now you can do this:

      pipx install llm llm install llm-command-r llm keys set cohere <paste Cohere API key here> llm -m command-r-plus "3 reasons to adopt a sea lion" One of the most interesting features of the Cohere API models is that they can run web searches and use the results as part of answering the prompt.

      The plugin adds that as a separate command, which works like this:

      llm command-r-search 'What is the LLM CLI tool by simonw?' Example output (truncated here):

      The LLM CLI tool is a command-line utility that allows users to access large language models. It was created by Simon Willison and can be installed via pip, Homebrew or pipx. The tool supports interactions with remote APIs and models that can be locally installed and run. Users can run prompts from the command line and even build an image search engine using the CLI tool. Sources: - GitHub - simonw/llm: Access large language models from the command-line - https://github.com/simonw/llm - llm, ttok and strip-tags—CLI tools for working with ChatGPT and other LLMs - https://simonwillison.net/2023/May/18/cli-tools-for-llms/

      PhilippGille 7 days ago | parent | next [–]

      Another, smaller (7B) model optimized for function calling and structured data is https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B

      Maybe not as powerful, but more suitable for running locally on mid range devices.

      irthomasthomas 7 days ago | parent | prev | next [–]

      I tried the original command-r in an agent setup, and it ended up lobotomized when the agent relied on its results. It has some weird gaps. Like, ask it for the pricing for Claude 3 models and it only lists the prices for output tokens, and only for text. I managed to get the input cost for text out of it, but I could not get it to even acknowledge that image input was a thing. Command-r+ just did exactly the same.

      tosh 7 days ago | prev | next [–]

      Weights available: https://huggingface.co/CohereForAI/c4ai-command-r-plus

      htrp 7 days ago | prev | next [–]

      What's the difference between this and their prior command R model? other than price

      Cohere API Pricing $ / M input tokens $ / M output tokens

      Command R $0.50 $1.50

      Command R+ $3.00 $15.00

      Command-R: RAG at production scale https://news.ycombinator.com/item?id=39671872

      simonw 7 days ago | parent | next [–]

      It's a whole lot bigger. Command R is a 35 billion parameter model: https://huggingface.co/CohereForAI/c4ai-command-r-v01

      Command R Plus is 104 billion: https://huggingface.co/CohereForAI/c4ai-command-r-plus

      htrp 7 days ago | root | parent | next [–]

      Wish they had led with that in their blog post. Thanks for the clarification!

      artninja1988 7 days ago | prev | next [–]

      Yet the license says:

      Besides this snark, seems really good from the benchmark and I'm really glad and grateful people are releasing weights :)

      leetharris 7 days ago | parent | next [–]

      It's because they go to enterprises and sell it to them directly with a different license.

      They open weight it so that any enterprise can download it and try it out in their various use cases first.

      It's a good strategy.

      LoganDark 7 days ago | parent | prev | next [–]

      They probably licensed it to Azure on different terms.

      ilc 7 days ago | parent | prev | next [–]

      Sounds like for business, by business.

      RecycledEle 7 days ago | prev | next [–]

      We need vocabulary to clarify use cases where someone in the company uses an LLM to get an answer as opposed to a customer facing LLM.

      rabbits77 7 days ago | parent | next [–]

      Is that an instance of a "human in the loop"? Suppose a customer calls customer service and the service agent uses an LLM iteratively to get a good answer rather than the customer frustrating or even misinforming themselves. That seems to be like what you are describing.

      iAkashPaul 7 days ago | prev | next [–]

      CC-by-NC still seems better than the DBRX's license prohibiting usage for improving other models-

      >2.3 Use Restrictions > You will not use DBRX or DBRX Derivatives or any Output to improve any other large language model (excluding DBRX or DBRX Derivatives).

      xfalcox 7 days ago | parent | next [–]

      Their other license also prohibits that

      > Cohere For AI Acceptable Use Policy > generating synthetic data outputs for commercial purposes, including to train, improve, benchmark, enhance or otherwise develop model derivatives, or any products or services in connection with the foregoing.

      iAkashPaul 7 days ago | root | parent | next [–]

      notfried 7 days ago | prev | next [–]

      Tangential, but this is a horrible choice for a header font at this size. I thought something was wrong in the rendering as I noticed white random smudges in the text.

      neals 7 days ago | prev [–]

      Would something like this allow me to insert the appstore user agreement (for example) and ask it questions about that?

      simonw 7 days ago | parent [–]

      Yes - or you could pipe it into one of the Claude 3 models, or probably gpt-4-turbo as well.

      Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
    timestamp: 2024-04-21T14:12:43
    title: 'Command R+: A Scalable LLM Built for Business | Hacker News'
    url: https://news.ycombinator.com/item?id=39930364
search_results:
  - search_query:
      text: command-r+ model size
      generation_id: 89a1873f-c214-4ac5-b900-216eb70c2081
    document_ids: []
    connector:
      id: web-search
  - search_query:
      text: command-r+ pricing compared to dbrx
      generation_id: 89a1873f-c214-4ac5-b900-216eb70c2081
    document_ids:
      - web-search_7
      - web-search_8
      - web-search_9
    connector:
      id: web-search
search_queries:
  - text: command-r+ model size
    generation_id: 89a1873f-c214-4ac5-b900-216eb70c2081
  - text: command-r+ pricing compared to dbrx
    generation_id: 89a1873f-c214-4ac5-b900-216eb70c2081
humphd commented 2 months ago

I love Cohere, and this only makes me love them more! I can't believe this is free.

Is this something you can do with your proxy, or do we need to do something in tree?

tarasglek commented 2 months ago

if we were to do it intree, we need to use their sdk. Anyone else can write a proxy too :)

And yeah, i love cohere for this and their general attitude