tlh24 / cortex

Apache License 2.0
7 stars 0 forks source link

getting oriented #2

Open ANaka opened 1 year ago

ANaka commented 1 year ago

Was able to get things nominally up and running based on the following:

To install, clone it, cd cortex/ec3 (ec = explore-compress, the intellectual lineage of DreamCoder, which was ec2 in their repo). ./install-deps.sh should build the ocaml executable. run it via ./run.sh -b 512 -g -p -b : batch size (change based on your gpu memory) -g : (optional) debug logging -p : parallel. (Defaults to assuming there are ~16 cores .. i should make that a parameter. turn it off when debugging )

Training: in a separate terminal, cd cortex/ec3 python ec33.py -b 512 This will start training. Batch size needs to be the same.

Dreaming: Once it writes out a model, you can start dreaming in yet another terminal: python ec33.py -b 512 -d where -d : dreaming You can monitor training progress in yet another term python plot_losslog.py -b 512 (window output: assumes running locally)

At present, the dreams don't directly feed back into the training. What I'm working on now. But, this is enough for you to poke around!

Probably going to have some high level questions about how data is getting passed around between the processes here but want to poke it a little first...

Two quick ones to help orient me:

tlh24 commented 1 year ago

Right, I'm glad that you got it working on your setup / AWS! Definitely need some architectural diagrams to show how data is passed around; I'll work on that shortly. In the meantime:

Exposition: The python model ec33.py takes a specification in terms of three images, A, B, B-A, and uses this to output a series of edits to a small program A. As you likely have seen, it's a ViT + token transformer, encoding only, based on Clip. The production of edits is fully supervised, ala UDRL.

When 'dreaming', the model takes the three image specification, of which B can be an MNIST digit, and generates a new program. This program is added to the database based on criteria (TBD -- currently cosine similarity). Dreaming is also used to replace programs with their simpler equivalents.

The ocaml program manages the programs, and outputs batches for python. They communicate through sockets (python -> ocaml : "update batch" and mem-mapped files (ocaml -> python : new data). There is one mem-maped file that communicates from python -> ocaml, for decoding the edits during the dreaming phase.

Setup: for development, I have two gpus, correct.

However, I think it will run on one GPU just fine. The training-dreaming split makes 2 cards natural, but not necessary.

Python environment: I've been developing on Debian Bookworm with python 3.10 (torch does not support 3.11) and Cuda 12. It's sufficiently aligned with Lambda stack that I haven't had to touch the python install when deploying there (which is infrequent, as my home computer tends to be faster than a virtualized 8x A100...)

However, I don't have strong opinions here, and would defer to better ideas (provided can stay on OG Debian :)

Docker: yes, if you think it would be good? From my perspective, other features are higher priority.