ndmitchell / hoogle

Haskell API search engine
http://hoogle.haskell.org/
Other
738 stars 134 forks source link

Invalid Html from Hoogle #275

Closed meck closed 5 years ago

meck commented 5 years ago

Currently the Target record targetItem has uses <SomeInteger> to index the contents. I couldn't parse it with Tagsoup. After a bit of reading turns out it not valid html as It must begin with a letter (the rest can be digits). I opened a issue a Tagsoup before I realised that...

While trying to parse the output from Hoogle in order to extract function names:

fmapFromHogle ="<span class=name><0>fmap</0></span> :: Functor f =&gt; (a -&gt; b) -&gt; f a -&gt; f b"

*Main λ> TS.parseTags fmapFromHogle
[TagOpen "span" [("class","name")],TagText "<0>fmap",TagComment "0",TagClose "span",TagText " :: Functor f => (a -> b) -> f a -> f b"]

the <0> isen't parsed as a tag, if I replace it with a <x> for example it works <x1> works as well, but <1x> dosen't

[TagOpen "span" [("class","name")],TagOpen "x" [],TagText "fmap",TagClose "x",TagClose "span",TagText " :: Functor f => (a -> b) -> f
a -> f b"]
ndmitchell commented 5 years ago

The reason for picking <0> etc was to slightly reduce the amount of space it takes up. However, I can't imagine its huge amount of space, and I don't think we're space constrained anymore, so happy enough to make it s0 and onwards (patch welcome!) - where the s stands for either style or span (not sure which makes more sense, but both abbreviate to s, happily).