Closed enoriega closed 1 year ago
There might be all of
/org/clulab/epimodel/epidemiology_embeddings_model.ser
/org/clulab/spaceweather/spaceweather_model_unigram.ser?
Gigaword for CosmosTextReadingPipeline
Glove for CluProcessor
CluProcessor for MiraEmbeddingsGrounder
FastNLPProcessor for OdinEngine
and more in there. The changes to get memory mapped embeddings would be quite extensive. It would probably be easier to start by trying to share Glove between the CluProcessor and CosmosTextReadingPipeline so that Gigaword is not necessary and then to share a single CluProcessor between the OdinEngine and MiraEmbeddingsGrounder so that FastNLPProcessor is not necessary. Those things should probably be done anyway.
Right. I believe we should share those instances to reduce the memory footprint. Let's circle back about this after the hackaton.
Our first goal is to reduce memory consumption below 16 GB (currently at 20 GB).
The next goal would be to reduce it to below 8 GB.
@enoriega , can you please summarize your concerns about using a shared processor? @kwalcock , do we have code elsewhere to mem-map those two sets of embeddings?
I believe we have to do it. I am only concerned that our extractions will change drastically if we change the processor type. We should be able to handle this monitoring the unit tests.
Ideally we should have a singleton processor shared among any pipeline instances
I have not seen any memory mapping code in any clulab or lum-ai project and probably would have noticed any related specifically to the embeddings. I suspect it will be slow, but slow might be worth it in this case, and it's nice to have options for varying constraints. I can imagine there needing to be a map of strings in memory, but that the values point to offsets to the vectors in the file. We will also probably need to extract from the jar to a local file at least once unless the files are distributed in a different way.
Here are some extra thoughts:
float16
and slash the memory footprint by halfThis looks relevant and readily available. If tables 3 and 4 transfer to our task, we're looking at a 100x reduction in size of the model, although we would need to add some book keeping to compose each embedding out of the code books. It doesn't sound too bad
Code: https://github.com/zomux/neuralcompressor Paper: https://arxiv.org/pdf/1711.01068.pdf
Perhaps we could try a quick experiment with one embedding model and if it looks good enough we can load it in Scala
Keith is working on further reducing the memory footprint.
This was completed by @kwalcock
The memory footprint of TR is too large, it requires at least 20GB to run. I believe this is due to a couple word embedding models we load in memory for grounding.
Can we mmap them such that the overall RAM usage is decreased?