loklak / loklak_server

Distributed Open Source twitter and social media message search server that anonymously collects, shares, dumps and indexes data http://api.loklak.org
GNU Lesser General Public License v2.1
1.38k stars 222 forks source link

[Generic Scraper] Add more fields to GenericScraper.java #598

Closed daminisatya closed 8 years ago

daminisatya commented 8 years ago

When compared to Diffbot, which scrapes only the basic information from a given URL, the current API is able to provide the following details.

Further enhancements would be better if it is able to scrape most deeply parsed data. I want suggestions on this. These are one's from my side.

Along with some other generic fields we can have options(as shown below) where specific data can be retrieved accordingly.

The above APIs are the services currently being offered by Diffbot. The links are to their docs and sample JSON objects which they have implemented.

daminisatya commented 8 years ago

@mariobehling @Orbiter any more suggestions?

sudheesh001 commented 8 years ago

Implemented with boilerpipe library for NLP