matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.87k stars 349 forks source link

Avoid make two requests to the same URL (cache support?) #265

Closed markotom closed 5 years ago

markotom commented 7 years ago

I want to avoid make more than one request to the same URL to get properties in depth. This is the use case:

import Xray from 'x-ray'
import * as filters from './filters'

const x = Xray({ filters })

x('http://...', '#content', {
  prop1: 'h1 | clean',
  prop2: x('a@href', '.selector1 | clean'),
  prop3: x('a@href', '.selector2 | clean')
})(function () {
  console.log(...arguments)
})

Debug mode displays the following:

x-ray got response for [FIRST_URL] with status code: 200 +14ms
x-ray got response for [SAME_URL] with status code: 200 +13ms
x-ray got response for [SAME_URL] with status code: 200 +54ms

Is it possible? I think that would be a nice feature. What do you think?

Very thanks for this great library!

0xgeert commented 7 years ago

You can do all of this easily witha custom driver. No need to adapt x-ray for that.

On Fri, Jun 30, 2017 at 5:50 AM, Marco Godínez notifications@github.com wrote:

I want to avoid make more than one request to the same URL to get properties in depth. This is the use case:

import Xray from 'x-ray'import * as filters from './filters' const x = Xray({ filters }) x('http://...', 'ul#list li.item', [{ prop1: 'h1 | clean', prop2: x('a@href', '.selector1 | clean'), prop3: x('a@href', '.selector2 | clean') }])(function () { console.log(...arguments) })

Debug mode displays the following:

x-ray got response for [FIRST_URL] with status code: 200 +14ms x-ray got response for [SAME_URL] with status code: 200 +13ms x-ray got response for [SAME_URL] with status code: 200 +54ms

Is it possible? I think that would be a nice feature. What do you think?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/matthewmueller/x-ray/issues/265, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYAKyCXFI1LT0t6pyq5iK2d1rW0rTb-ks5sJHCRgaJpZM4OKHwE .

lathropd commented 5 years ago

Closing based on above response.

Also, adding this behavior would lead to hitting the target site multiple times, which is not good practice.