lnklnklnk / ga-bq

Stream raw hit-level Google Analytics data into BigQuery
90 stars 38 forks source link

Queue configuration #9

Closed mac2000 closed 7 years ago

mac2000 commented 7 years ago

Will be nice if there will be queue configuration description in readme, we have tried out your implementation but in our case queue now is 1 million records, seems that queue handler is not able to process all incomming requests, just wondering, from your experience what is the best way to tune it

lnklnklnk commented 7 years ago

@mac2000 Hello! You can try to increase this number: https://github.com/lnklnklnk/ga-bq/blob/master/process_queue.py#L17 Try to increase it from 1000 to 10000 or 50000.

Hope this will help you :)

mac2000 commented 7 years ago

We have duplicated cronjob items ten times - it worked out (but we did not investigate is there a chance that some records may be processed twice)

Also we have tried to increase num of records but 1K is max

So seems that only way is to increase number of workers, when there will be a prof that records not processed twice it may be part of readme as an tuning tip

mac2000 commented 7 years ago

Right now been able to reproduce simplified application in local development all works as expected, so to deal with projects that have more than 1K events per minute making more cronjobs is a way to quick win