Open skalee opened 3 years ago
...
... For... this gem? I don't care much about this gem, although offering lots of compilation options for end users appears bizarre to me.
... For the Metanorma stack? That is a ton of refactoring with no strong motivation, on a massive stack. I am unwilling to do it.
The story behind is that iev-data is annoyingly slow. I wanted to get some easy performance gains by processing concepts in parallel (https://github.com/glossarist/iev-data/issues/139), but it seems to make almost no difference and I blame Nokogiri for that.
Don't worry about this feature request, I'll implement it myself or close it. When it comes to multithreading in iev-data, Nokogiri isn't the only obstacle, unfortunately.
... For the Metanorma stack? That is a ton of refactoring with no strong motivation, on a massive stack. I am unwilling to do it.
Would be useful for https://github.com/metanorma/metanorma-cli/issues/228. But again, multithreading isn't straightforward in iev-data and in Metanorma it may be even more difficult. And as you have said, there may be too much work in so many parts of Metanorma stack. At the moment I'm rather sceptical about https://github.com/metanorma/metanorma-cli/issues/228, but we'll see.
In the CLI issue there does not require shared states — it’s as easy as starting separate threads independently for the document compiling step.
Keep in mind, that Ox has complicated rules for whitespace cleanup, so it doesn't return a fully correct representation. In particular, Plurimath does depend on those rules - I have ported it to be able to use Oga as well - and I had to recreate those rules - see: https://github.com/plurimath/plurimath/blob/main/lib/plurimath/xml_engine/oga.rb#L57 . In this case, we would have to do a reverse thing, which would be... perhaps even impossible - or it may not really matter for this usecase.
Regarding Oga, when interacting with the upstream, they have suggested that the library is in maintenance stage, but even then, they haven't updated the Gem for a year.
Nokogiri is basically a wrapper over libxml2 and libxslt (or some Java libraries in case of JRuby). This has consequences.
Benefits:
Trade-offs:
Oga is another popular XML/HTML library. It is also mature and performant, although recently it does not receive frequent updates. Furthermore, it is written almost exclusively in Ruby, what makes it suitable for multithreading. Sadly, Oga fails to parse malformed HTML documents, for example
a<!-->b
, whereas Nokogiri recovers according to rules defined in the standard. That means that there are situations in which we have to stick to Nokogiri.Ox is another popular library. It's meant to be fast, faster than Nokogiri, but is written mostly in C and is also incapable of dealing with malformed documents.
The preferred solution is to support all these gems so that user can choose.