Open DyeffersonAz opened 6 years ago
Thanks for the suggestion! I agree it would be a cool feature, but given the data source I'm using, it is not really easy to do. I don't ever actually see the full text of the Wikipedia page itself, just the Wikipedia database containing all the links. So I can't easily show you the context around where the link shows up in the actual page. Also, since the database is only updated monthly, it is possible the link is actually no longer on the page itself as it may have been edited since the latest database dump. Maybe I'll figure out a way to do this in the future, but for now, this is not feasible with my current architecture.
You can't pick the HTML of the page, can you?
I definitely could try something like that and I honestly think that is the way this would need to be implemented. But it wouldn't be very efficient and the system currently doesn't ever look at the raw HTML.
Also, it would be better than needing to dump the database much times, it'd be automatic
There is no way to do the actual search algorithm using live pages as it would take way too long. Thousands to tens of thousands of pages need to be touched. What I was referring to was just pull the context for a single page when you, for example, click on it in the graph view.
Yep
Maybe you could look through the HTML after the search has completed. Then do some web scraping to look for the link on the page and return the title of the section or subsection it was found in.
Maybe you could look through the HTML after the search has completed. Then do some web scraping to look for the link on the page and return the title of the section or subsection it was found in.
This is what I was suggesting. A way SDOW could go to the live wikipedia page and search for each link, then return the parent header of that <a>
element, for example.
I don't have knowledge in web-development to help with this yet, unfortunately.
that would be very nice since I cannot find any links shown in the results on either of the pages requested
To show where the links were found, just because sometimes I can't find where this link is in the page.