scrapy / scrapely

A pure-python HTML screen-scraping library
1.86k stars 272 forks source link

Does the order of annotations matter - Weird output #66

Open dav009 opened 9 years ago

dav009 commented 9 years ago

I've been playing with scrapely, and this script generates some weird output:

  1. annotate url1
  2. try scrapping url1, got the expected output
  3. annotate url2
  4. try scrapping url2, got nothing from scrapping url2.

I thought it could be train since it is not supposed to be reliable, but when exported the annotated data the annotations seems alright.

Then I inverted the order:

  1. annotate url2
  2. try scrapping url2, got the expected output
  3. annotate url1
  4. try scrapping url1, got something different from the annotation( a subset of what was annotated)

Is this a expected behaviour ?