readium / readium-js

EPUB processing engine written in Javascript
BSD 3-Clause "New" or "Revised" License
372 stars 107 forks source link

Text search over entire book #17

Open pivotal-versapub opened 10 years ago

pivotal-versapub commented 10 years ago

We use jQuery to search through a book. In previous Readium versions, with eager loading enabled, we were able to search through the entire book. But eager loading has been removed.

We would prefer a Readium native way to search through the entire book. This could use the same approach as Total Page Numbers Issue #16 with a second hidden readium and iterating over that.

jozol commented 10 years ago

How can you achieve that?

rkwright commented 10 years ago

Don't know if this will make m1.1, but will leave it under that milestone for now.

danielweck commented 10 years ago

I hear good things about Forage (previously-known as Norch): https://github.com/fergiemcdowall/forage

danielweck commented 9 years ago

Good work being done here: https://github.com/GermanCentralLibraryForTheBlind/readium-js-viewer/commits/feature/full_text_search

naveen1941 commented 9 years ago

https://github.com/fergiemcdowall/forage

How to use above to read Epub book. How to integrate this along with https://github.com/GermanCentralLibraryForTheBlind/readium-js-viewer/commits/feature/full_text_search

naveen1941 commented 9 years ago

How to implement text search over entire book?

danielweck commented 9 years ago

@naveen1941 you can experiment with this Node-based search engine:

https://github.com/larsvoigt/epub-full-text-search

larsvoigt commented 9 years ago

I will post when the mentioned search engine can be run and tested. I think, this will be in the near future :-).

naveen1941 commented 9 years ago

Thanks you team. ☺☺

On Mon, Jul 13, 2015 at 5:57 PM, Lars Voigt notifications@github.com wrote:

I will post when the mentioned search engine can be run and tested. I think, this will be in the near future :-).

— Reply to this email directly or view it on GitHub https://github.com/readium/readium-js/issues/17#issuecomment-120909357.

_Regards,_Naveen Kumar

rkwright commented 9 years ago

@larsvoigt : Thanks! Much appreciated

larsvoigt commented 9 years ago

Hi all,

I think I have reached a pre-alpha state of the full-text-search-feature.

Status informations following:

Note:

Running demo

A running demo can be found here: http://fulltextsearch-readium.rhcloud.com/. It combines epub-full-text-search and readium-js-viewer.

Source of demo

The source code of the demo can be found here: Source Code For testing call:

Note: The indexing process starts automatically and it takes a few seconds until the search service is really available. Availability is indicated by cli status "all is indexed".

epub-full-text-search usage

If you want see how this feature can be implemented then check out branch. To get it run call:

Feature main components:

Thank you for your feedback :-)

danielweck commented 9 years ago

@larsvoigt great thanks! I tried searching for "reilly" in the "accessible EPUB3" ebook hosted at http://fulltextsearch-readium.rhcloud.com => when pressing the next / previous buttons, the highlighted fragments of text (search hits) are sometimes incorrect. Can you reproduce this bug?

Speaking of highlights...I assume you are using the built-in "annotations" plugin, which utilises a div overlay method to render the selection? This plugin is now deactivated in the develop branch, and it is likely to be removed entirely due to memory bugs / obsolescence (CC @JCCR): https://github.com/readium/readium-shared-js/issues/201 Once the existing CFI bugs have been ironed out, and the architecture improved, it will be a good time to reinstate a more robust version of the highlighting mechanism.

larsvoigt commented 9 years ago

Thank @danielweck for your feedback. I will fix this bug as soon as possible. Indeed I use the built-in "annotations" plugin. Upps I am not up-to-date. Do you mean, that highlighting is impossible by using the latest developing branch?

danielweck commented 9 years ago

As of right now, the develop branch still contains the "annotations" plugin, it is just deactivated. See https://github.com/readium/readium-shared-js/blob/develop/plugins/plugins.cson You can re-activate it in your custom readium-shared-js/plugins/plugins-override.cson, but the plan is indeed to extract the plugin into a feature branch, so that it does not interfere with future core CFI bug fixes / architecture improvements.

larsvoigt commented 9 years ago

Thank you for sharing this information. At the moment I will use the re-activate option to support highligting. This way I will keep synchronized with the latest developing.

danielweck commented 9 years ago

@larsvoigt , note that Juan Corona @JCCR has started work on refactoring the old "annotations" plugin (which was really more about "highlighting" document ranges than a full-blown annotations engine).

See: https://github.com/readium/readium-js-viewer/pull/403 https://github.com/readium/readium-shared-js/pull/212

This is an architecturally-improved version of the old annotations plugin, with some fixes too.

larsvoigt commented 9 years ago

Thanks @danielweck for this hint. Great job @JCCR! Unfortunately I can't test it today. But next week I will have a look on it.

brezal commented 8 years ago

What is the status of this? Search would be highly useful :)

danielweck commented 8 years ago

Have you tried @larsvoigt 's open-source server-side indexer / search service? https://github.com/larsvoigt/epub-full-text-search http://protected-dusk-3051.herokuapp.com/?searchbox=popu