oricdev / prosim

backend bone for computing similarities between OpenFoodFacts products
Other
0 stars 0 forks source link

try creation of prosim-db in one single shot #6

Open oricdev opened 6 years ago

oricdev commented 6 years ago

Requirements: Issue #1 implemented

Why doing this? As stated and repeated, the interset process performs a Mapreduction in-memory with data from the data packages (quick), but then, performs another Mapreduction with records already present in the Prosim-db which were created during previous sliced imports. This latter Mapreduction is a very heavy process to deal with for the MongoDb (performed for each record in Memory but still, a lot of read, write, expand, indexing staff in the db). Hence it could be interesting to determine a maximum amount of products for which a 1-shot integration could be performed (only in-memory Mapreductions). Thus would let us gain a considerable amount of time (no scheduled tasks 2 an hour anymore) and possibly could the Prosim-db be generated from scratch in less than a day instead of several days.

How to proceed? Number of products with appropriate non-empty tags for making the comparison between products is limited to about 20% of the OFF official db: about 110.000 / 550.000 products Check what happens in terms of resources used (memory, disk speed/space, overall behaviour) if we decide to create the Prosim-db in 1 shot by setting the environment as follows: