topic proposal: auto-completion

asteroidb612 commented 3 months ago

Original proposal:

I've been working on a Roc project recreating this yot neural network engine https://github.com/karpathy/micrograd, and think it might be an interesting chapter.

Revised proposal (see thread below): show how autocompletion works.

asteroidb612 commented 3 months ago

One reason this could be a bad chapter idea is that it requires some Calculus thinking! But if this is for schools, maybe there's a lot of calculus going along there anyway.

isaacvando commented 3 months ago

I'd love to read that chapter!

gvwilson commented 3 months ago

My concern isn't with the math requirements, but whether programmers use neural networks when they're programming: most of the other tools are things like editors and linters that crop up regularly when building and deploying code.

asteroidb612 commented 3 months ago

A year ago, I would have agreed immediately - I never used any machine learning tools while learning to code. But I'm starting to see them used more, and I believe that ChatGPT was used as a low-reliability but occasionally very helpful tool in making Roc.

Maybe those cases were as obscure as making parsers! But maybe they're commonly useful? I am finding myself using ChatGPT as a faster-than-documentation search for how to use various libraries or frameworks.

Anton-4 commented 3 months ago

I use chatGPT almost everyday when working on Roc :) Neural nets also power many autocomplete tools. It's also possible to make the chapter about a tool that connects to e.g. the chatGPT API to avoid getting into the math too much.

gvwilson commented 3 months ago

I think that implementing a neural network would be a lot safer than using an external API - the latter are changing so rapidly right now that the chapter could be out of date as soon as it appears. Does Roc have something like NumPy that you could build the NN computations on? If not, could that be the first chapter, and the NN the second? (The JS and Py versions of the book build row-wise and column-wise dataframes in order to illustrate ideas about interface vs. implementation and using benchmarking to pick which implementation is best—could that work here?)

Anton-4 commented 3 months ago

the latter are changing so rapidly right now that the chapter could be out of date as soon as it appears.

Good point, another option would be download a pre-trained neural network model from a stable URL and run it locally.

Does Roc have something like NumPy that you could build the NN computations on?

Not yet, someone from Roc zulip has been experimenting with matrices but I have not looked at it closely yet.

I think explaining the inner workings of neural nets in depth is not feasible considering the one hour time limit. Andrej Karpathy, an excellent teacher spends about 2h30m on it [1], [2]. That is also for a "vanilla neural net", not the more complicated transformer ones people actually use for coding assistance.

Making a tool that uses a downloaded neural net seems to have the best trade-offs.

isaacvando commented 3 months ago

I would be more interested in reading a chapter that implemented a neural net than one that used a preexisting one. I also don't think it is necessary to fully understand the topic after reading a chapter and I suspect that a worthwhile treatment could still be done in an hour.

Anton-4 commented 3 months ago

That's reasonable, we can draft the chapter like that and see how we feel about it then :)

asteroidb612 commented 3 months ago

Andrej Karpathy's approach in that micrograd video is exactly what I'd like to present. I would crib his perspective, where we ignore optimizations like linear algebra. I would implement backpropogation on simple networks, like in the video you link @Anton-4. I think we could get it down to an hour, if we remove some of the dotlang and python operator override content.

Ideally we could have something useful at the end:

A network that can do word2vec
Identify a programming language given a file
etc.

I think it's once the backpropogation algorithm is understood, it's easy for us to say "Add lots more data / training time / clever network structure / $$$ and you have chatgpt."

asteroidb612 commented 3 months ago

I think that viewing machine learning through functional programming lenses is enlightening. Your neural network is just a function - we can even write it's type signature! But it's a function that we train instead of writing.

I have a hunch that roc will be actually nice for this kind of thing! My progress was stalled by a lambda set error but that has just been unblocked.

asteroidb612 commented 3 months ago

If someone were building a dataframe chapter, it would be interesting to base this off that. Or maybe we make a third chapter combining the two basic chapters?

gvwilson commented 3 months ago

I still think that neural networks don't fit the "tools programmers use to program" theme, but I realize I might just be showing my age :-). I am more certain that there are two chapters here if we want to respect the "teachable in one hour" restriction per chapter:

NumPy-in-Roc (NumRoc?), i.e., a linear algebra package. This could be pure Roc or a Roc wrapper around Polars.
A neural network built on top of that linalg package.

If y'all agree, let's create a separate ticket for the linear algebra package and see who wants to take it on.

Anton-4 commented 3 months ago

NumPy-in-Roc definitely sounds good!

I still think that neural networks don't fit the "tools programmers use to program" theme

I do agree, a more fitting possibility would be neural net based autocomplete but that seems too large in scope.

gvwilson commented 3 months ago

What about a more traditional autocomplete whose completion tree is updated incrementally based on what's currently in scope? I think most programmers rely on that in their editor - is that big enough/interesting enough for a chapter?

Anton-4 commented 3 months ago

is that big enough/interesting enough for a chapter?

I think so.

I see two possible approaches:

Use Roc as the language to be autocompleted and show how to fetch possible completions using the Roc language server. Language servers are definitely a commonly used and important tool.
Use English as the language to be autocompleted and have a much more self-contained example. So for example, given the text I typed in this comment, if I now were to type pos, it would suggest possible.

gvwilson commented 3 months ago

Can you do the latter first to show learners how incremental autocomplete works from the ground up? I think that building a small language server would be a great second chapter, but as a learner, I'd want to know what the magic is before relying on an external service to do it for me. (Cool idea, by the way...)

Anton-4 commented 3 months ago

Yeah that could work :)

I personally already have a lot to do with other Roc things, but any available motivated person could probably take on the first chapter of this. Are you interested in working on the second chapter about a tiny language server @faldor20?

faldor20 commented 3 months ago

Yeah, I'd be interested in taking that on. I was actually thinking it would be cool to try building a language server framework in roc ontop of tower-lsp, so maybe we could have two examples, one showing roc wrapped around an existing rust framework and one showing a pure roc implimentation that just talks of stdio?

The pure roc one is a much bigger task so I'd probably try to only show the most basic part, basically reading and writing jsonrpc from stdio and handling some basic updates and responding to one or two language server requests.

I was imagining either I could base everything of the roc compiler and just kind of hand wave how the actual calls work, or do something like @Anton-4 suggested and just turn every word in the text into a "symbol" and pretend it's the output of a compiler.

How in depth would we like to go here? Well made language servers tend to have a lot of pretty complex state management. They do a lot of caching and incrimental updating and recompilation. How far into the weeds do we want to get? Or should I just keep it as simple as "this is a naive implementation, here is where you could improve it in the real world"?

gvwilson commented 3 months ago

I think it would be a lot more approachable to do the simple version first (here's a vocabulary, autocomplete from it) and then build the language server as a separate chapter - I don't believe both will fit into our one-hour-per-lesson limit, and I think the latter will be more comprehensible after people have seen the former. @faldor20 are you interested in doing the first part?

faldor20 commented 3 months ago

I'm honestly unsure what you imagine the first part to look like?

I'm not sure it makes sense to implement any kind of autocomplete system with no foundation to actually use it in, I would argue the only really hard part of autocomplete for plain-text is dealing with the document updates and sending info out of the language server.

Implementing autocomplete as I imagine it is basically just a fuzzy search algorithm and a super simple parser that finds all the words In a document. Infact in roc-ls we don't even have fuzzy autocomplete yet 😅

But I think maybe I'm misunderstanding what you were imagining.

faldor20 commented 3 months ago

Oh, and I tried a quick mock up of jsonrpc parsing and realised roc is unfortunately currently unable to parse Json that contains unions.(types like id:number|string) Which makes implementing LSP in roc impossible right now :( ( see this zulip thread in Json null handling)

faldor20 commented 3 months ago

Okay, I went off and tried to work on my knowledge of abilities and decoders and it is actually possible, I take it all back, I'll get my implementation done soon.

gvwilson commented 3 months ago

Thanks @faldor20 - can you please create a subdirectory under the project root called completion and put your work there, along with an index.md file with notes to yourself? Cheers - Greg

roc-lang / book-of-examples

topic proposal: auto-completion #4