Switched to using the MongoDB schema, rather than the flatter schema from the CSV file.
Created new product object that mirrors that schema
Various filtering fixes for import (as we now have more data)
Modified queries to use the correct nested syntax
Made import work in parallel, due to the longer import time
Ported the Perl code that enriches the result with image URLs: add_images_urls_to_product
Switched Redis to just use the code of the product (We now fetch the object via an API call when it's updated via Redis, rather than sending the entire JSON from the client)
The new index (with all the data from MongoDB) is 50GB (100GB with two nodes and needs ~4GB of RAM for the import and the import takes ~an hour :(
However, having the complete data means this can be a drop in replacement for the old APIs. Furthermore, the search is still reasonably fast - ~0.5 seconds for some large queries.
As usual - will merge this tomorrow unless there's feedback.
Switched to using the MongoDB schema, rather than the flatter schema from the CSV file.
The new index (with all the data from MongoDB) is 50GB (100GB with two nodes and needs ~4GB of RAM for the import and the import takes ~an hour :( However, having the complete data means this can be a drop in replacement for the old APIs. Furthermore, the search is still reasonably fast - ~0.5 seconds for some large queries.
As usual - will merge this tomorrow unless there's feedback.