polyrabbit / hacker-news-digest

:newspaper: Let ChatGPT Summarize Hacker News for You
http://hackernews.betacat.io/
GNU Lesser General Public License v3.0
668 stars 87 forks source link

Why use json functions format #35

Closed thiswillbeyourgithub closed 1 month ago

thiswillbeyourgithub commented 2 months ago

Hi,

Reading the commits I was under the impression that to create the summaries you're using a function calling and try to use json output, is that correct?

Link: https://github.com/polyrabbit/hacker-news-digest/commit/07849b268714498a292170c76492e606cd80f8c0#diff-bf1ae841312c375660901819714eae08367ade8803fa5c719d98a9843a197253R69

polyrabbit commented 1 month ago

Right, that's because I want to get summaries and translations to other languages in one response - a feature to translate HN summaries to other languages as requested by users.

It's now deprecated, as I found it hard to maintain.

thiswillbeyourgithub commented 1 month ago

Yes I can understand. Function calling / jsonformatting is not equally supported by all models and it adds a substantial amount of tokens, so the LLM has all the more to struggle to do the summary, and its even worse for multiple language.

To me it looks optimistic to say the least to expect gemma 2b to do multilingual summary in a json format :)

Now what's the process? Do you do a loop over the language you want and ask for a sumary? With proper system prompting It sould be fine no?

edit: I read a bit too fast your reply, do you mind sharing why you wanted to have all in only one response?

polyrabbit commented 1 month ago

Progress is - I gave up this feature. No user is requesting translations other than Chinese,

Another reason is gemma 2b is very good at summarizing, but poor at translating. I don't want to complicate code to involve other translation models.

see: https://github.com/polyrabbit/hacker-news-digest/blob/07849b268714498a292170c76492e606cd80f8c0/hacker_news/news.py#L139-L143

thiswillbeyourgithub commented 1 month ago

Alright interesting. Thank you for the infos. Also, when I see some article that clearly got wrongly translated is there a way to warn you short of creating an issue here? I remember seeing recently some never ending token repetition for example

polyrabbit commented 1 month ago

Yes please let me know. The repetition is generated by gemma, I'll see what I can do outside the model.

thiswillbeyourgithub commented 1 month ago

In case you don't know openai has parameters called frequency penalty and repetition penalty, those might be good keywords to investigate

thiswillbeyourgithub commented 1 month ago

Attackers Can Decloak Routing-Based VPNs By dsr_ Hacker News Summary / May 6, 2024, 23:21 • 1 min read

Main Points:** - Researchers identified a technique called "TunnelVision" that allows attackers to force users' traffic outside of a VPN tunnel by manipulating DHCP routing configurations. - This technique has been feasible since 20 vicissulation 20 vicissulation 20 vicissulation 20 vicissulation 20 vicissulation 20 vicissulation 20 vicissulation 20 vicissulation 20 vicissulation 20 vicissulation 2 and has affected Windows, Linux, macOS, and iOS devices. [summary] [comments]

polyrabbit commented 1 month ago

Add frequency_penalty and presence_penalty seems to fix this case. But I'm not sure if it's enough for other cases. Please let me know if you notice any other issues.

Thanks for your suggestion!

Researchers discovered a technique called "TunnelVision" that allows attackers to force users' traffic outside of their VPN tunnel by manipulating DHCP routing configurations. This bypasses the security measures intended to protect users on untrusted networks. The researchers urge VPN providers and operating system maintainers to implement network namespaces as a mitigation.
thiswillbeyourgithub commented 1 month ago

A 100x speedup with unsafe Python 166 ingve 8 hours ago 68 https://yosefk.com/favicon.ico yosefk.com OpenAI Share The author discovered a 10 vicissarray in performance when using cv2.resize on data from a pygame surface due to different strides. By exploiting layout flexibility and utilizing the unsafe Python approach, they achieved a 10 Kün factor speedup

I dont wtf there are viciss words everywhere

thiswillbeyourgithub commented 1 month ago

Faster XML Stream Processing in Go (2019) 49 PaulHoule 6 hours ago 9 https://eli.thegreenplace.net/favicon.ico eli.thegreenplace.net OpenAI Share https://eli.thegreenplace.net/images/2019/xml-sax-comparison.png Go's encoding/xml package is slower than Python's ElementTree and lxml libraries for streaming XML processing, taking 6 viciss. 56 viciss. The optimized C implementation using libxml is the fastest, taking 0 viciss. 0 viciss

thiswillbeyourgithub commented 1 month ago

Frank Stella has died 30 prismatic 7 hours ago 2 https://www.nytimes.com/vi-assets/static-assets/favicon-d2483f10ef688e6f89e23806b9700298.ico nytimes.com OpenAI Share https://static01.nyt.com/images/2021/05/06/obituaries/00Stella5/merlin_150536814_dfe3ad37-cb73-40aa-a498-5648df2bdcc1-articleLarge.jpg?quality=75&auto=webp&disable=upscale Frank Stella, a leading minimalist artist, revolutionized American art with his enigmatic black paintings of the 1950 viciss paintings. His austere and enigmatic works captivated audiences and redefined the boundaries of artistic expression

polyrabbit commented 1 month ago

Maybe I need a larger model to get rid of those v* words.

thiswillbeyourgithub commented 1 month ago

What's the largest model you're able to afford? I would gladely help with prompting

polyrabbit commented 1 month ago

Sorry, but I currently have no plan to invest large sums of money/time into this project. It's just a hobby project for convenience.

So the largest model I'm willing to afford is the free one listed in openrouter.ai :)

thiswillbeyourgithub commented 1 month ago

Perfectly understandable :) thank you

(I've been reading you every morning for many months or even a year+!)

polyrabbit commented 1 month ago

Glad to hear that!

Lower cost will keep this site running longer.

thiswillbeyourgithub commented 1 month ago

Oh also, gemma has a weirdish tokenizer I think, maybe the visciss bullshits are from a tokenizing issue, and/or logit bias might be a solution.

But I heard you when you said you dont want to invest too much time.

polyrabbit commented 1 month ago

Ah, could you elaborate on the logit bias approach or point me to some examples? I'm a newbie in this field, but very interested in learning more and applying it to my project.

Also, my concern is will the logit bias approach creates a new model? If so, I need to find a place to host it, which introduces a new dependency to maintain - that's what I'm trying to avoid.

thiswillbeyourgithub commented 1 month ago

logit bias is an argument you can pass to openai's models that bias the LLM to produce more of that and less of that. for example: logit_bias={"A":100} will make the llm output only As for infinity because the scale is -100 (forbids A) to 100 (insanely likes answering As). But to be correct it's not "A" but the token associated with A.

You can play around with openai's tokenizer here: https://platform.openai.com/tokenizer image

so the correct argument would be logit_bias={32:100} to output A for infinity.

But each model has its own tokenizer, and not all implementation allow specifying logit bias.

It's just an api call argument so not a whole new model.

In the doc for openrouter.ai it seems they sometimes support logit_bias or ignore it depending on the model : https://openrouter.ai/docs#requests

polyrabbit commented 1 month ago

Many thanks for the direction. I'll look into it.

One more question - do you have any idea how to get Gemma's token IDs for those v* words?

thiswillbeyourgithub commented 1 month ago

What lib are you using to call the LLM? or point me to the exact code location

polyrabbit commented 1 month ago
thiswillbeyourgithub commented 1 month ago

You might be interested instead in using litellm it's a cool lib that makes switching models easy.

But they don't include all tokenizer for all models, so your best best would be to use huggingface, there are tons of tutorials and gpt4 might even be able to give you the snippet right away.

polyrabbit commented 1 month ago

Update:

Snippet to get the token id

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b")

token = 'Kün'
token_id = tokenizer.encode(token, add_special_tokens=False)

# word = tokenizer.decode(token_id[0])

print(f'The id for token "{token}" is {token_id}')

Request body to ignore ün

{
    "model": "google/gemma-7b-it:free",
    "temperature": 0,
    "frequency_penalty": 1,
    "presence_penalty": 1,
    "logit_bias": {"5268": -100},
    "n": 1,
    "timeout": 30,
    "stream": false,
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful summarizer. Please think step by step and use third person mood to summarize all user's input in 2 short English sentences. Ensure the summary does not exceed 100 characters. Provide response in plain text format without any Markdown formatting."
        },
        {
            "role": "user",
            "content": "CUPERTINO, CALIFORNIA Apple today announced M4, the latest chip delivering phenomenal performance to the all-new iPad Pro. Built using second-generation 3-nanometer technology, M4 is a system on a chip (SoC) that advances the industry-leading power efficiency of Apple silicon and enables the incredibly thin design of iPad Pro. It also features an entirely new display engine to drive the stunning precision, color, and brightness of the breakthrough Ultra Retina XDR display on iPad Pro. A new CPU has up to 10 cores, while the new 10-core GPU builds on the next-generation GPU architecture introduced in M3, and brings Dynamic Caching, hardware-accelerated ray tracing, and hardware-accelerated mesh shading to iPad for the first time. M4 has Apple’s fastest Neural Engine ever, capable of up to 38 trillion operations per second, which is faster than the neural processing unit of any AI PC today. Combined with faster memory bandwidth, along with next-generation machine learning (ML) accelerators in the CPU, and a high-performance GPU, M4 makes the new iPad Pro an outrageously powerful device for artificial intelligence. “The new iPad Pro with M4 is a great example of how building best-in-class custom silicon enables breakthrough products,” said Johny Srouji, Apple’s senior vice president of Hardware Technologies. “The power-efficient performance of M4, along with its new display engine, makes the thin design and game-changing display of iPad Pro possible, while fundamental improvements to the CPU, GPU, Neural Engine, and memory system make M4 extremely well suited for the latest applications leveraging AI. Altogether, this new chip makes iPad Pro the most powerful device of its kind.” Delivering a giant leap in performance over the previous iPad Pro with M2, M4 consists of 28 billion transistors built using a second-generation 3-nanometer technology that further advances the power efficiency of Apple silicon. M4 also features an entirely new display engine designed with pioneering technologies, enabling the stunning precision, color accuracy, and brightness uniformity of the Ultra Retina XDR display, a state-of-the-art display created by combining the light of two OLED panels. M4 has a new up-to-10-core CPU consisting of up to four performance cores and now six efficiency cores. The next-generation cores feature improved branch prediction, with wider decode and execution engines for the performance cores, and a deeper execution engine for the efficiency cores. And both types of cores also feature enhanced, next-generation ML accelerators. M4 delivers up to 1.5x faster CPU performance over the powerful M2 in the previous iPad Pro.1 Whether working with complex orchestral music files in Logic Pro or adding highly demanding effects to 4K video in LumaFusion, M4 boosts performance across pro workflows. The new 10-core GPU of M4 builds upon the next-generation graphics architecture of the M3 family of chips. It features Dynamic Caching, an Apple innovation that allocates local memory dynamically in hardware and in real time to dramatically increase the average utilization of the GPU. This significantly increases performance for the most demanding pro apps and games. Hardware-accelerated ray tracing comes to iPad for the first time, and enables even more realistic shadows and reflections in games and other graphically rich experiences. Hardware-accelerated mesh shading is also built into the GPU, and delivers greater capability and efficiency in geometry processing, enabling more visually complex scenes in games and graphics-intensive apps. Pro rendering performance in apps like Octane gets a huge boost with M4, and is now up to four times faster than on M2.1 With these improvements to the CPU and GPU, M4 maintains Apple silicon’s industry-leading performance per watt. M4 can deliver the same performance as M2 using just half the power. And compared with the latest PC chip in a thin and light laptop, M4 can deliver the same performance using just a fourth of the power.2 M4 has a blazing-fast Neural Engine — an IP block in the chip dedicated to the acceleration of AI workloads. This is Apple’s most powerful Neural Engine ever, capable of an astounding 38 trillion operations per second — a breathtaking 60x faster than the first Neural Engine in A11 Bionic. Together with next-generation ML accelerators in the CPU, the high-performance GPU, and higher-bandwidth unified memory, the Neural Engine makes M4 an outrageously powerful chip for AI. And with AI features in iPadOS like Live Captions for real-time audio captions, and Visual Look Up, which identifies objects in video and photos, the new iPad Pro allows users to accomplish amazing AI tasks quickly and on device. iPad Pro with M4 can easily isolate a subject from its background throughout a 4K video in Final Cut Pro with just a tap, and can automatically create musical notation in real time in StaffPad by simply listening to someone play the piano. And inference workloads can be done efficiently and privately while minimizing the impact on app memory, app responsiveness, and battery life. The Neural Engine in M4 is Apple’s most capable yet, and is more powerful than any neural processing unit in any AI PC today. The Media Engine of M4 is the most advanced to come to iPad. In addition to supporting the most popular video codecs, like H.264, HEVC, and ProRes, it brings hardware acceleration for AV1 to iPad for the first time. This provides more power-efficient playback of high-resolution video experiences from streaming services. The power-efficient performance of M4 helps the all-new iPad Pro meet Apple’s high standards for energy efficiency and deliver all-day battery life. This results in less time needing to be plugged in and less energy consumed over its lifetime. Today, Apple is carbon neutral for global corporate operations, and by 2030, plans to be carbon neutral across the entire manufacturing supply chain and life cycle of every product. Testing was conducted by Apple in March and April 2024. See apple.com/ipad-pro for more information. Testing was conducted by Apple in March and April 2024 using preproduction 13-inch iPad Pro (M4) units with a 10-core CPU and 16GB of RAM. Performance was measured using select industry‑standard benchmarks. PC laptop chip performance data is from testing ASUS Zenbook 14 OLED (UX3405MA) with Core Ultra 7 155H and 32GB of RAM. Performance tests are conducted using specific computer systems and reflect the approximate performance of iPad Pro"
        }
    ]
}

But the word ün still appears:

: Apple unveiled the new M4 viciss chip for the iPad Pro, offering up to 1.5x faster CPU performance and a 6 Kün 10 viciss graphics boost. The M4 viciss delivers exceptional AI capabilities with a Neural Engine that is 6 purcha more powerful than any PC counterpart.

The weird thing is when I set bias to {"5268": 100}, it throws lots of ün characters. Which means a positive bias is accepted, but not a negative one?

:ünününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününününn Apple's new M4 viciss chip delivers significant performance improvements to the iPad Pro. It features a 10 Kün core CPU and a 10 viciss GPU, alongside an incredibly fast Neural Engine capable of 38 trillion operations per second.
thiswillbeyourgithub commented 1 month ago

So

  1. Kün should not be called "token" but "word". There can be many tokens inside a single word, especially for gemma which has a large vocabulary
  2. I get as token_id for Kün "[235333, 5268]" and not 5268
  3. The logit bias keys should be int not str, they are token ids
  4. It's not because ün is part of Kün that the tokens of the first are present in the second
    
    In [10]: tokenizer.encode("ün", add_special_tokens=False)
    Out[10]: [5268]

In [12]: tokenizer.encode("K", add_special_tokens=False) Out[12]: [235333]

In [13]: tokenizer.encode("Kü", add_special_tokens=False) Out[13]: [107232]

In [11]: tokenizer.encode("Kün", add_special_tokens=False) Out[11]: [235333, 5268]


6. try intermediate values for logit_bias, like -1, -10 etc. We don't know the details of google's implementation afaik
7. btw casing matters