molybdenum-99 / infoboxer

Wikipedia information extraction library
MIT License
174 stars 16 forks source link

Infoboxer intro includes single math formulas #68

Closed phlegx closed 8 years ago

phlegx commented 8 years ago

Hi!

If I want to fetch for e.g. this site:

paragraphs = Infoboxer.wikipedia('de').get('Energiedosis').intro

I get the math formula as second paragraph. Why?

Same here:

paragraphs = Infoboxer.wikipedia('es').get('Potencia_(física)').intro[2].to_s
 => "La potencia instantánea es el valor límite de la potencia media cuando el intervalo de tiempo
Δt se aproxima a cero. En el caso de un cuerpo de pequeñas dimensiones: {\\Delta t} = \\m
athbf{F}\\cdot \\mathbf{v} ||left Donde" 

If the template starts with {{ and checks all variables with @context.eat_matched?('}}') as end, then it stops in the middle of the formula after \Delta\mathbf{r.

{{ecuación|
<math>P(t) = \lim_{\Delta t\rightarrow 0} \frac{\ W}{\Delta t}\ =
\lim_{\Delta t\rightarrow 0} \mathbf{F}\cdot\frac{\Delta\mathbf{r}}{\Delta t} =
\mathbf{F}\cdot \mathbf{v}</math>
||left}}

My opinion: tag <math> should remain in the result so that can be handled in HTML!

zverok commented 8 years ago

Definitely a bug. Looking into it.

zverok commented 8 years ago

Hey. In fact, two problems here: first is <math> rendering to text, and second is mix of templates and <math>, erroneously parsed. The second one is fixed by #70, and the first I've fixed too, now Math node is surrounded by <math> tag while rendering to text.

Everything is in develop branch, please confirm it works for you as expected.