proycon / foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
GNU General Public License v3.0
10 stars 4 forks source link

[folia2html] Implement superscript #26

Closed proycon closed 3 years ago

proycon commented 3 years ago

Although classes for styles are not predefined in FoLiA, folia2html does interpret a few classes like "bold", "italic". An extra implementation is needed for "superscript", as requested by @pirolen. This would map nicely to HTML's <sup> element. And whilst we're at it we should do subscript too.

On an unrelated note: FLAT doesn't visualize this either currently and there it would be less trivial to implement.

proycon commented 3 years ago

This is implemented now.

@pirolen You may want to check how this was implemented in case you want to add your own custom style interpretations at some point. It is pretty easy to do.

pirolen commented 3 years ago

Thanks very much, also for the pointer!

pirolen commented 3 years ago

You may want to check how this was implemented in case you want to add your own custom style interpretations

It seems that span class="style_smallcaps" is not implemented, shall I make an attempt? ;-o

proycon commented 3 years ago

Ha, yes ;) Feel free to give it a try! There is no direct HTML equivalent like with sup and sub, but you can map the style directly to CSS's font-variant: small-caps.

pirolen commented 3 years ago

Just asking: would font sizes also be useful to account for by the tool? If not, I would customize the style sheet for my own usage as you suggest, -- outside of LaMachine then? (In any case, not before it is decided, how FoLiA-abby will format the style infos.)

proycon commented 3 years ago

The font sizes, as implemented in FoLiA-abby, are too non-standard to really make it into the tool I'm afraid. It's already a bit of a violation that tool is actually interpreting t-styles at all (as it's up to the user to define the vocabulary and not FoLiA).

What the tool does do, as you saw, is assign a "style_$class" label to anything it can't interpret (which formally should be everything). This gives the option of defining the styling in an associated CSS rather than the HTML itself. We could expand the tool to take a user-provided stylesheet and associate that with the HTML, and we could expand the behaviour that it also writes out CSS classes for things like the features under t-style. That would give an elegant solution where you don't have to modify folia2html nor the XSL and have maximum flexibility in how things are displayed. What do you think?

kosloot commented 3 years ago

Well, the font size is added as a feature to the \<t-style> as suggested by @proycon . In hindsight I wonder if a new \<t-style> like member in \<t> nodes might be an idea like: \<t-font style="bla" size="6" family="Times New Roman"> etc.

pirolen commented 3 years ago

Both of the last two suggestions by you guys sound very practical.

proycon commented 3 years ago

The ability to associate an external CSS stylesheet is now implemented (pending release still), use the -s option to folia2html

proycon commented 3 years ago

Well, the font size is added as a feature to the as suggested by @proycon . In hindsight I wonder if a new like member in nodes might be an idea like:

I'm not in favour of adding a new element. Font information is style information as far as I'm concerned, and we have all the facilities for that, no need to complicate it further.

pirolen commented 3 years ago

The ability to associate an external CSS stylesheet is now implemented (pending release still), use the -s option to folia2html

That's great, thanks! It would be great if one could reuse what was implemented so far, I liked the FLAT-like design :-) Would it be possible that you share it as a template or similar?

proycon commented 3 years ago

The custom CSS is applied on top of the default one, so by default you get the current FLAT-like design, and you only need to specify the things you want to change.

pirolen commented 3 years ago

I see. Just that by default none of the font styles get rendered now it seems, and some of them used to. Now there are only <span class="style_"> elements if I see it well.

proycon commented 3 years ago

Do you have an example document I can test this with? I can't reproduce it yet with one of my own.

pirolen commented 3 years ago

Yes, please find two files attached. (These are the same files I sent yesterday, when discussing how to use foliapy to access style elements.)

FA-b1_2_mwtext_pp97_352_1.png.folia.xml.txt FA-Prototyp_MWG-I-23_147-215_1.png.folia.xml.txt

proycon commented 3 years ago

Thanks. I see the issue. Everything moved to features rather than t-style itself so the stylesheet can't handle that by default. I'll make an adjustment to folia2html so taht these features are also reflected in the CSS class, then you can handle it in your custom stylesheet.

proycon commented 3 years ago

@pirolen Ok, I have implemented that now. You should get multiple classes, representing all of the features. The CSS classes for the features take the form style_$subset_$class. Note that things like spaces and other special characters are purged because they won't be valid in CSS class names. So you get output like:

<span class="style_none style_font_family_TimesNewRoman style_font_size_9 style_font_style_1C065687-49EE-47E6-95F8-B3607B0A5B23">Vorbemerkung. </span>
proycon commented 3 years ago

In this situation though, the limited default text styling that the folia2html did does not apply (because this is too specific), so you'll have to define it all in your own CSS stylesheet.

pirolen commented 3 years ago

OK, sounds good, thanks very much!