osanseviero / hackerllama

My personal site
Apache License 2.0
64 stars 5 forks source link

hackerllama/blog/posts/hitchhiker_guide/ #6

Open utterances-bot opened 7 months ago

utterances-bot commented 7 months ago

hackerllama - The Llama Hitchiking Guide to Local LLMs

https://osanseviero.github.io/hackerllama/blog/posts/hitchhiker_guide/

DrChrisLevy commented 7 months ago

This is amazing, thanks !

2404589803 commented 7 months ago

great job!

fpaupier commented 7 months ago

Great overview of the different concepts, discovered many! thanks @osanseviero

havenqi commented 7 months ago

great job! marking this post

FelikZ commented 7 months ago

Good stuff. Would be nice to have a dive into Embeddings and tooling around it.

sanzgadea commented 6 months ago

Good post! One comment is that Flash Attention is not an approximation of attention but it is exact, meaning it computes the exact attention calculation. It achieves the speedup through optimized memory access and parallel processing techniques.

sugatoray commented 3 months ago

This is an incredibly useful article. Thank you @osanseviero for maintaining this.