Open Changaco opened 6 years ago
Interesting! Do you have any benchmarks as to how much memory this is actually saving?
Also, dunno if it'd help, but there's the sys.intern()
function too.
Currently our app has 29 catalogs containing 1179 message IDs each, and get_size()
tells me that the keys of one catalog use 306kB of memory, so sharing them saves 8.58MB.
sys.intern()
is a great suggestion, it could speed up message lookups as a bonus. It must have a small memory cost though, since CPython needs to keep track of interned strings, whereas our code above throws away the source_strings
dictionary as soon as we've loaded all the PO files.
I've just tried intern()
, it's not a drop-in replacement because it's limited to strings, whereas message IDs can be tuples of strings.
Edit: moreover intern()
only supports one type of string (str
, different across python versions).
We use Babel's
read_po()
function to load our webapp's translations, and we've realized that doing it this way means that source strings (a.k.a. message IDs) are stored multiple times in memory, while they should be stored only once since they're common between PO files. With large numbers of messages and catalogs this can result in significant RAM consumption.We fixed this inefficiency by creating the following
share_source_strings
function:and calling it after each
read_po()
:Maybe a similar mechanism could be integrated into Babel so that memory usage would be optimized by default? If not, then a note could be added in the documentation about how to optimize the memory footprint of catalogs.