simone-trubian / dropp

Data miner client for the Dropp platform
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

[DPP-60] The availability of some BangGood pages is not processed #61

Closed simone-trubian closed 8 years ago

simone-trubian commented 8 years ago
Issue

The new fetching and scraping functions do not seem to work all the times: some BangGood produce a Nothing at the end of the processing pipeline, but if fetched from the REPL the updating process works correctly.

Possible causes

The processing pipeline can fail in a few points:

First enable a very simple error printing in the "catch" branch of the catch function. If that isn't sufficient in shedding any light over the issue, overhaul the entire Http and HTML modules to use Either instead of Maybe and log the left branch as well as any HttpException.

simone-trubian commented 8 years ago

After implementing error printing the following errors were logged:

failed to fetch eu.banggood.com/Wholesale-Warehouse-2Pcs-Black-Union-Jack-Flag-Vinyl-Mirrors-Stickers-For-Mini-Cooper-wp-Eu-985030.html because of InvalidUrlException "eu.banggood.com/Wholesale-Warehouse-2Pcs-Black-Union-Jack-Flag-Vinyl-Mirrors-Stickers-For-Mini-Cooper-wp-Eu-985030.html" "Invalid URL"

failed to fetch eu.banggood.com/Wholesale-Warehouse-300pcs-M3-Nylon-White-Hex-Screw-Nut-Spacer-Stand-off-Varied-Length-Assortment-Kit-Box-wp-Eu-984548.html because of InvalidUrlException "eu.banggood.com/Wholesale-Warehouse-300pcs-M3-Nylon-White-Hex-Screw-Nut-Spacer-Stand-off-Varied-Length-Assortment-Kit-Box-wp-Eu-984548.html" "Invalid URL"

failed to fetch eu.banggood.com/Wholesale-Warehouse-A4-30X20cm-Grid-Self-Healing-Cutting-Craft-Mat-Engraving-Board-Double-Sided-wp-Eu-986712.html because of InvalidUrlException "eu.banggood.com/Wholesale-Warehouse-A4-30X20cm-Grid-Self-Healing-Cutting-Craft-Mat-Engraving-Board-Double-Sided-wp-Eu-986712.html" "Invalid URL"

failed to fetch eu.banggood.com/Wholesale-Warehouse-6-Inch-150mm-Electronic-Mini-Digital-Caliper-Micrometer-Guage-Ruler-wp-Eu-41970.html because of InvalidUrlException "eu.banggood.com/Wholesale-Warehouse-6-Inch-150mm-Electronic-Mini-Digital-Caliper-Micrometer-Guage-Ruler-wp-Eu-41970.html" "Invalid URL"

Going through the latest JSON file it was noticed that some item objects contain bad URL's, for example:

{
  "item_name": "6 Inch 150mm Electronic Mini Digital Caliper Micrometer Guage Ruler",
  "source_url": "eu.banggood.com/Wholesale-Warehouse-6-Inch-150mm-Electronic-Mini-Digital-Caliper-Micrometer-Guage-Ruler-wp-Eu-41970.html",
  "ebay_url": "http://www.ebay.it/itm/Calibro-Digitale-Elettronico-0-150mm-6-alta-precisione-strumenti-misura-/152097499307?ssPageName=STRK:MESE:IT"
},
{
  "item_name": "6 Inch 150mm Electronic Mini Digital Caliper Micrometer Guage Ruler",
  "source_url": "eu.banggood.com/Wholesale-Warehouse-6-Inch-150mm-Electronic-Mini-Digital-Caliper-Micrometer-Guage-Ruler-wp-Eu-41970.html",
  "ebay_url": "http://www.ebay.it/itm/Calibro-Digitale-Elettronico-0-150mm-6-alta-precisione-strumenti-misura-/152097499307?ssPageName=STRK:MESE:IT"
}

Incidentally those items are the one which availability cannot be updated.

Call for action

Refer to ticket #63 Mark resolved when all items with an existing page can be updated.

simone-trubian commented 8 years ago

Resolved by ticket #63