techfromsage / tripod-php

Object Graph Mapper for managing RDF data in Mongo
MIT License
29 stars 4 forks source link

Find a more efficient way to pass the Tripod config to the process queue jobs #83

Open rsinger opened 9 years ago

rsinger commented 9 years ago

We're currently passing in a Tripod config array with every background job which places a massive overhead for the amount of jobs Resque can handle.

If, rather than passing the config as part of the job, we stored the config, keyed by a hash of the config, in redis, we would have a much smaller memory footprint and could subsequently queue a lot more jobs.

rsinger commented 9 years ago

I don't think we'd need to change the footprint of the job data: if the tripodConfig value is an array, assume that the config is being passed directly, if it's a string, assume that's a key to look up in redis.

rsinger commented 9 years ago

Just a really quick investigation into this found that for regular Tripod job, we pass in a JSON string that's around 33,602 characters long (depending on the store).

If we gzcompress that JSON string, it's 5,087 characters long. I'm actually going to recommend that we Base64 that gzcompressed string (6,785 characters) so we don't gum up the Resque web interface too badly. It's a longer string, but I think it would pay off in the end.

Anyway, if my math is right (it's probably not!), we should be able to put about 5x as many jobs on the queue for the same memory footprint.

rsinger commented 9 years ago

I kind of feel like there must be some way to get that original config size down, as well.

scaleupcto commented 9 years ago

Yup agree, original config is massive and I have been thinking about that for a while now.

Just a thought - does converting it to YAML and then compressing that buy us anything?

Also is it possible to cherry pick - at least on the Apply jobs - just enough config to get the work done?

rsinger commented 9 years ago

Given that there's no native YAML support in PHP, anyway, shoot for the moon. Open the possibilities to Thrift or protobuf.

scaleupcto commented 9 years ago

Filesize is > if I convert to YAML anyway, due to whitespace, so scratch that. Partial config could work although gzipping the JSON seems like a good start.

What do we gain from the Base64'ing?

scaleupcto commented 9 years ago

Actually, I just cat'ed a gzip, see what you mean!