tmc / langchaingo

LangChain for Go, the easiest way to write LLM-based programs in Go
https://tmc.github.io/langchaingo/
MIT License
4.24k stars 589 forks source link

Splitting langchaingo into sub-modules #369

Open eliben opened 9 months ago

eliben commented 9 months ago

Currently langchaingo is a single Go module (modulo the examples, which live in separate modules per example - consistently after #367 lands).

This means that users who are only interested in using langchaingo with a single backend LLM (say, OpenAI), will see their go commands pull in many dependencies they don't necessarily need. Even though the go build tool won't include them in the users' programs, this may have negative effects for - say - CI time. But it's important to hear what specific concerns users are having.

Splitting the repo into multiple modules is natural on the LLM provider level; e.g. langchaingo can be its own (top-level) module, and each LLM provider another module nested within the repo, importing the main langchaingo module for the common stuff. This way users can only import the modules they need and their dependencies.

This isn't free of cost, however. Managing multiple modules in a repo requires some care, especially around releases. In a way it's like each module living in its own separate repo, except that in a single repo they can share tooling and scripts. Also, with the existence of go workspaces, a go.work file can make local development much more pleasant.

One pre-requisite for this would be to start tagging actual releases of langchaingo. We can start with 0.1.0 and follow semver to increase the minor version as much as we need.

eliben commented 9 months ago

I just tested this by copying examples/openai-completion-example into a standalone directory with a new module, and ran go mod tidy to see what is pulled in:

$ go mod tidy
go: finding module for package github.com/tmc/langchaingo/llms/openai
go: finding module for package github.com/tmc/langchaingo/llms
go: found github.com/tmc/langchaingo/llms in github.com/tmc/langchaingo v0.0.0-20231122191601-2eb6f5408849
go: found github.com/tmc/langchaingo/llms/openai in github.com/tmc/langchaingo v0.0.0-20231122191601-2eb6f5408849

It's not that bad, actually! The Go tool is good about only pulling whichever dependencies are needed.

tmc commented 9 months ago

Is it my imagination or did it not used to be as smart with unused transitive dependencies?

eliben commented 9 months ago

Yes, this is module pruning which was shipped in Go 1.17 -- see https://go.dev/ref/mod#graph-pruning (there's a link there to a design doc with more details if you're interested)

tmc commented 9 months ago

I'd still like to find ways to keep the number of dependencies low and want to explore this more -- we could perhaps analyze what is bringing in the most and consider more targeted submodules there.

tmc commented 9 months ago

I'm experimenting with what having modules under ./contrib would look like here: https://github.com/tmc/langchaingo/tree/add-contrib

eliben commented 8 months ago

I think the add-contrib branch looks alright if you want to go in this direction.

Just like with the examples, the dependency has to be always one way - the contrib modules depend on the main langchaingo module, not the other way around. And dependency management will have to be done similarly with bumping main module versions in all the go.mod when the main module has a new release. This creates an issue when you want to add a feature to the main module and immediately use it in a contrib module because there's no new tag yet; it can be tested with a go.work file locally, but CI may be unhappy on the PR.

tmc commented 8 months ago

Yeah, I’m thinking we could do some CI automation around go.work files.

Another option would be something that would auto-tag pre-release tags or branches so that pushes to branches would become go-gettable.

Thoughts there?

Could do something like branch “pr-$n” gets created/updated when a PR added/updated.

On Thu, Dec 28, 2023 at 5:29 PM Eli Bendersky @.***> wrote:

I think the add-contrib branch looks alright if you want to go in this direction.

Just like with the examples, the dependency has to be always one way - the contrib modules depend on the main langchaingo module, not the other way around. And dependency management will have to be done similarly with bumping main module versions in all the go.mod when the main module has a new release. This creates an issue when you want to add a feature to the main module and immediately use it in a contrib module because there's no new tag yet; it can be tested with a go.work file locally, but CI may be unhappy on the PR.

— Reply to this email directly, view it on GitHub https://github.com/tmc/langchaingo/issues/369#issuecomment-1871662482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAA7CPI7UFN6LDKOG7IUUTYLYMG7AVCNFSM6AAAAAA7YBL6P2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZRGY3DENBYGI . You are receiving this because you commented.Message ID: @.***>

eliben commented 8 months ago

It's possible, but I don't think it makes sense to over-complicate this, honestly.

Most changes don't need to affect all modules at the same time; since go.mod versions don't get auto-bumped, things should work in between. E.g. if we make a change in main module vN, then want to use this change in a sub-module, we can have one commit updating the main module, then tag vN, then another commit updating the go.mod line in a contrib module to use vN and the code change.

Single commit/PR affecting both main and contrib modules should hopefully be rare.

I would recommend waiting with automation to see what actual issues we encounter in this repository, since each project is different in this respect.

devinyf commented 7 months ago

Just wondering... Is it a good idea to use go-plugins to manage third-party modules like llms, tools ??