Closed GoogleCodeExporter closed 8 years ago
[deleted comment]
I love the idea of having more information about the links but am hesitant on
adding anymore parsing than needs to happen since most people wouldn't need the
link text or need to know if the link was an image. Can you first attach the
impl that actually fills/returns the list of ILinkInfo object so I can take a
quick look?
Thank you for offering your code!!!!
Original comment by sjdir...@gmail.com
on 18 Dec 2013 at 9:40
My code is here, I have changed the HyperLinkParser to return a list of
ILinkInfo, instead of the Uri-list that is returned now. In addition I have
changed the interface for the PageRequester to crawl PageToCrawl objects
directly instead of the Uri-object it currently accepts. When I want to crawl
extra metadata I can then subclass ILinkInfo and update my own HyperLinkParser
accordingly. The only remaining implementation would be to implement something
like a PageToCrawl.Bag for storing the metadata. I have done this the ugly way
locally (By just modifying the PageToCrawl class), so I am not sharing that
code. Also, I have not updated the CsQueryHyperLinkParser, as I am using the
HAP-parser:)
I dont know if this is the best way of implementing the described
functionality, but I have made an attempt at least, so just let me know if you
like it :) I havent tested it, but I assume it will work just fine :)
Modified files are attached.
Original comment by d.st...@gmail.com
on 19 Dec 2013 at 7:53
Attachments:
fyi, v1.2.3 already has a PageToCrawl.PageBag of dynamic expando type.
I'll take a look at your impl and get back to you. Thanks again.
Original comment by sjdir...@gmail.com
on 19 Dec 2013 at 6:04
As of right now, i don't think I will pull your changes into the product due to
the reasons I stated above. However, i may change my position in the future.
Thanks for offering your implementation. Your time is appreciated.
Original comment by sjdir...@gmail.com
on 30 Dec 2013 at 3:12
Original issue reported on code.google.com by
d.st...@gmail.com
on 18 Dec 2013 at 12:17