Open Marcel0024 opened 3 months ago
Just realized you would have to change the Job
implementation as well
Because every page would have to become a TargetPage
.
Damn there's no way to override this. I thought with a custom IContentParser
would do the trick, but ran into this.
Hi, i've been looking at this library, it's really promissing. It really saves a lot of time writing boiler plate. But i'm missing one feature to really be able to use it for my use-case.
Is your feature request related to a problem? Please describe.
The issue i'm running into is i don't have to open each link to scrape them. My first page is the page with listings and has pagination.
For example:
Page 1
Page 2
The way the library is setup is, i have to
.Follow(...)
each link and.Parse(..)
each one opened page. But in my case i don't have to. The data i need is on this page already.Describe the solution you'd like
Ability to parse a List, maybe use a JArray for the object returned in the entity.
Describe alternatives you've considered
I didn't find a workaround. I did try something like this:
But all listing are the same, since the query selector just grabs the first one https://github.com/pavlovtech/WebReaper/blob/master/WebReaper/Core/Parser/Concrete/AngleSharpContentParser.cs#L85
Additional context
To keep backwards compatability, i think this needs to be implemented on
SchemaElement
with a new property. MaybeIsList
orIsArray
.In
FillOutput()
https://github.com/pavlovtech/WebReaper/blob/master/WebReaper/Core/Parser/Concrete/AngleSharpContentParser.cs#L43 in thetry
we can add differentiate if it's a list or not, if so,GetListData()
returns a list of data to adda JArray.I'm willing to work on a PR with some guidance/approval.