Open ANaka opened 1 year ago
Right, I'm glad that you got it working on your setup / AWS! Definitely need some architectural diagrams to show how data is passed around; I'll work on that shortly. In the meantime:
Exposition:
The python model ec33.py
takes a specification in terms of three images, A, B, B-A, and uses this to output a series of edits to a small program A. As you likely have seen, it's a ViT + token transformer, encoding only, based on Clip. The production of edits is fully supervised, ala UDRL.
When 'dreaming', the model takes the three image specification, of which B can be an MNIST digit, and generates a new program. This program is added to the database based on criteria (TBD -- currently cosine similarity). Dreaming is also used to replace programs with their simpler equivalents.
The ocaml program manages the programs, and outputs batches for python. They communicate through sockets (python -> ocaml : "update batch" and mem-mapped files (ocaml -> python : new data). There is one mem-maped file that communicates from python -> ocaml, for decoding the edits during the dreaming phase.
Setup: for development, I have two gpus, correct.
However, I think it will run on one GPU just fine. The training-dreaming split makes 2 cards natural, but not necessary.
Python environment: I've been developing on Debian Bookworm with python 3.10 (torch does not support 3.11) and Cuda 12. It's sufficiently aligned with Lambda stack that I haven't had to touch the python install when deploying there (which is infrequent, as my home computer tends to be faster than a virtualized 8x A100...)
However, I don't have strong opinions here, and would defer to better ideas (provided can stay on OG Debian :)
Docker: yes, if you think it would be good? From my perspective, other features are higher priority.
Was able to get things nominally up and running based on the following:
Probably going to have some high level questions about how data is getting passed around between the processes here but want to poke it a little first...
Two quick ones to help orient me:
run.sh
that reads# use the first 4090 (Second one for python)
- does this imply two GPUs with one running the ocaml stuff and the other one doing pytorch?ec33.py
running just by doing a naked pip install oftorch
andmatplotlib
though that's not best practice obviously. Mainly asking because ocaml is a black box to me for now and I don't really understand what (if any) dependencies might be getting shared between it and a pytorch installation.