Data reduction and clean up - Githubissues

ponyriders / django-amazon-price-monitor

Monitors prices of Amazon products via Product Advertising API

Other

156 stars 36 forks source link

Data reduction and clean up #27

Closed dArignac closed 9 years ago

dArignac commented 10 years ago

The release cycle of python-amazon-simple-product-api is unclear, within the github repo you cannot find out what is contained in which release - also there is no release log. We used python-amazon-simple-product-api at the beginning because it was easy to use and provided a lot of values we needed (and a lot we do not need). The python-amazon-simple-product-api would not cover the requirements for ticket #19 but bottlenose provides the relevant information. I'd like to strip down our models to only the fields we need and query the Amazon API with only these values. Therefore using bottlenose is the tool of choice as it is a dumb and simple wrapper around the Amazon Product Advertising API and nothing more.

dArignac commented 10 years ago

https://github.com/lionheart/bottlenose http://docs.aws.amazon.com/AWSECommerceService/latest/DG/CHAP_ApiReference.html

dArignac commented 10 years ago

We should use the bottlenose caching feature to avoid plenty of similar calls: https://github.com/lionheart/bottlenose#caching

dArignac commented 10 years ago

Check the appropriate wiki page for progress: https://github.com/ponyriders/django-amazon-price-monitor/wiki/Branch-data-reduction

dArignac commented 9 years ago

With some products we have an exception directly after creation (e.g. ASIN B000053ZRV).

Update: happens only after starting celery and adding a product

[2015-04-02 15:16:25,096: CRITICAL/MainProcess] Internal error: RuntimeError('maximum recursion depth exceeded in comparison',)
Traceback (most recent call last):
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/celery/worker/__init__.py", line 227, in _process_task
    req.execute_using_pool(self.pool)
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/celery/worker/job.py", line 263, in execute_using_pool
    correlation_id=uuid,
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/celery/concurrency/base.py", line 156, in apply_async
    **options)
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/billiard/pool.py", line 1434, in apply_async
    self._quick_put((TASK, (result._job, None, func, args, kwds)))
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/celery/concurrency/asynpool.py", line 770, in send_job
    body = dumps(tup, protocol=protocol)
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/bs4/element.py", line 939, in __getattr__
    if len(tag) > 3 and tag.endswith('Tag'):
RuntimeError: maximum recursion depth exceeded in comparison
Traceback (most recent call last):
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/celery/worker/__init__.py", line 227, in _process_task
    req.execute_using_pool(self.pool)
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/celery/worker/job.py", line 263, in execute_using_pool
    correlation_id=uuid,
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/celery/concurrency/base.py", line 156, in apply_async
    **options)
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/billiard/pool.py", line 1434, in apply_async
    self._quick_put((TASK, (result._job, None, func, args, kwds)))
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/celery/concurrency/asynpool.py", line 770, in send_job
    body = dumps(tup, protocol=protocol)
  File "/home/alex/projects/pm/.env/lib/python3.4/site-packages/bs4/element.py", line 939, in __getattr__
    if len(tag) > 3 and tag.endswith('Tag'):
RuntimeError: maximum recursion depth exceeded in comparison

dArignac commented 9 years ago

Hm, we create a single request for each ASIN and somehow run into the amazon request limit. That is not fully reproducable but it happens. Since I change the behaviour to fork a single task for each asin it got better but the errors still occur. With the ItemLookup it is possible to request more than one ASIN, simply comma separated. We should take a look into this.

dArignac commented 9 years ago

Additionally we have a funny endless queue phenomenom here: the sync all products task runs every 5 minutes. A taskrun queries 10 products and if there are more products, the task will be run again after 10 seconds. If there are more than 300 products to sync, the periodic task will run before the actual sync run is done. This stacks up a pile of queues.

dArignac commented 9 years ago

Endless queue is fixed in #38.

dArignac commented 9 years ago

Regarding more performant amazon queries, read this: http://docs.aws.amazon.com/AWSECommerceService/latest/DG/PerformingMultipleItemLookupsinOneRequest.html

dArignac commented 9 years ago

I moved the performance increase to a separate ticket #41. Now we're done here.