Closed Audun97 closed 1 year ago
Ahhh, because my xhtml already references a css style sheet. A new one is not made which contains the highlighting style, I presume. Does syncabook support custom css? If so where to put it in the file structure?
Edit: I managed to get it working with highlighting when editing the finished epubs reference to css style sheet.
One just need the css style sheet to have this
.-epub-media-overlay-active {
background-color: #FFFF00;
}
@Audun97 nice! We can leave the issue open so that people who want to use their xhtmls can find your script. I think this functionality can be integrated into syncabook but it would require some more consideration. The script will work for xhtmls structured in a particular way – with text contents directly inside paragraphs. What if they are nested in spans? If I'm going to add this, I'd need to explore what are the common xhtml epub structures are and whether it's possible at all to cover most cases.
The issue is only if you want to preserve xhtml structure/formatting. If we just want to use existing xhtml to produce a synced ebook, we can just extract the text and produce new xhtml according to syncabook's structure. This is easy. But I don't this is necessary since people can probably find plaintext files with the same content as well.
So I suggest we leave the issue open and see if there are more people who want to use xhtml with formatting preserved or not. You can rename it to something like "Use existing xhtml files for syncabook".
"The script will work for xhtmls structured in a particular way – with text contents directly inside paragraphs. What if they are nested in spans?"
That is what the current_span is supposed to keep track of. Upon testing of more files it hangs up. It should test if the children are in the same p and their string does not contain a punctuation, comma, semi-colon etc they should be wrapped inside the same span.
At least every html/xhtml I have seen uses paragraph tags (p tags) so it is from that basis the script works. The descendants property in for example "p.descendants" is nice as it allows me to go through all the children recursively
I canʼt get this to work with text contents directly inside paragraphs; it completely misses such text because itʼs NavigableString
s. It seems to work if the text is in at least one element nested inside the p
and text is only in leaf elements (no <p>This example will <em>not</em> work.</p>
, or only the <em>not</em>
will be picked up).
Does that match what youʼre seeing or am I missing something?
I think the way it should work is as follows:
p.strings
span
around them.span
s possible to exactly cover the sentence<em>This</em> sentence should be doable in one <span> element.
and This sentence should <em>too.</em>
should work without splitting, even though in each the start and end might not seem to have the same parent.I donʼt know enough about BS4 to actually code that, although I might try over the weekend and get farther than I expect. There are probably edge cases Iʼm not thinking of, too, although probably most of them are things aeneas
would also choke on.
Hey, @dhouck sorry for responding so late. The problem I had was that I tried to wrap the same string twice. For example <a><span>sentence</span></a>
. Here the a tag has the string sentence as span. I managed to fix it now. For your case I have also made a fix. Take a look in my repository https://github.com/Audun97/audio-ebook-id-inserter
If you find any more edge cases let me know
Just a +1 for such a feature : we are a non-profit foundation lending audio books for the blind and other conditions preventing from reading. We've started producing epubs this year, syncing them to our human read audio books, and two cases for us would greatly benefit from this feature : 1) ebooks modified for accessibility, so we need to use the specific xhtml files for the sync 2) ebook with syllable colorization (for the dyslexic public), again, to use the specific existing xhtml files.
I'm looking forward to try this code !
@Audun97 hey! Do you still have that fixed script available? The link you shared is dead, and I couldn't find it elsewhere on your github repos
Thanks!
Hello, I managed to get it to work when using text files. However to maintain formatting I tried to write a python script to add id attributes to existing xhtml files. It seems to work partly as when I change the page in colibrio reader or change position by pressing on a word it goes to that part in the audio. The only thing missing is the highlighting of text. Do you know of any requirement for it to work? If we managed to get it working I think it would be a nice addition to syncabook
Here is the script: