snowleopard / alga

Algebraic graphs
MIT License
716 stars 68 forks source link

dot syntax escaping required for defaultStyleViaShow #44

Open jmtd opened 6 years ago

jmtd commented 6 years ago

A Graph type's show implementation might generate strings that result in invalid dot output. For example

> putStrLn (export defaultStyleViaShow (path ["test \"1\"", "2"]) :: String)
digraph
{
  ""2""
  ""test \"1\"""
  ""test \"1\""" -> ""2""
}

or even

> putStrLn (export defaultStyleViaShow (path ["1", "2"]) :: String)
digraph
{
  ""1""
  ""2""
  ""1"" -> ""2""
}

Coincidentally the latter is legal Dot, but unexpected:

out5

I think the legal characters for identifiers are [['a'..'z'],['A'..'Z'],['\200'..'\377'],"_"]

snowleopard commented 6 years ago

@jmtd Thanks! Indeed, this looks problematic.

Presumably, this is the right output?

digraph
{
  "\"1\""
  "\"2\""
  "\"1\"" -> "\"2\""
}
snowleopard commented 6 years ago

While this is being resolved, the workaround is to use exportAsIs after converting all vertex labels to strings using the right escaping. For example:

> putStrLn (exportAsIs (path ["1", "2"]) :: String)

digraph
{
  "1"
  "2"
  "1" -> "2"
}
jmtd commented 6 years ago

Hi,

Presumably, this is the right output?

Yep that's right.

While this is being resolved, the workaround is to use exportAsIs after converting all vertex labels to strings using the right escaping.

Very useful, thanks!

jmtd commented 6 years ago

This seems like a suitable piece of work for a beginner PR, so I am going to take a stab at this, however if anyone else wants to try it please don't let me stop you.

snowleopard commented 6 years ago

@jmtd Sure, please give it a try!

The main difficulty seems to be finding a way to escape characters without sacrificing the polymorphic interface. At the moment users can export into String, Text or any other similar datatype. How do we implement the escaping logic generically?

Perhaps, we need to define a type class of 'escapable strings' with a couple of standard instances:

class Escape s where
    escape :: s -> s

instance Escape String where
    escape [] = []
    escape (c:cs) = escapeChar c ++ escape cs
      where
        escapeChar :: Char -> String
        escapeChar c = ...

instance Escape Text where ...

For example, I found exactly the same solution here.

Then we'll need to strengthen the constraint on exportViaShow from Show s to (Show s, Escape s).

Does this make sense? Or is there a better approach?

jmtd commented 6 years ago

The main difficulty seems to be finding a way to escape characters without sacrificing the polymorphic interface.

You're right ,that's a detail I overlooked entirely when I first reported this (I subsequently came up with concatMap (\x-> if elem x badChars then ["\\",x] else [x]), felt pleased with myself, and thought all that was needed was plugging it into the right place).

Text.JSON.Escape

I don't know of a better approach; this looks reasonable but I'll do some looking around (probably Tomorrow) and see what else comes up.

jmtd commented 6 years ago

Sorry for the radio silence here, slow progress, my WIP (not PR-worthy yet) is here https://github.com/jmtd/alga/tree/escape-typeclass

snowleopard commented 6 years ago

@jmtd No problem, take you time :) Don't hesitate to send a draft PR -- happy to review and iterate.

jmtd commented 6 years ago

OK I'll open one (now :)) as I have something that works for the simple case above, but there's clearly more work to be done.

jmtd commented 6 years ago

A brief update. I thought it wise to look at what others had done already, and so I was reading through Data.GraphViz.Printing, in particular addEscapes (anchor in that link). The de-facto rules appear to be considerably different to the theoretical ones (I was surprised for example to see escLetters = Set.fromList ['N', 'G', 'E', 'T', 'H', 'L', 'n', 'l', 'r']. I wonder why they are escaped?)

I imagine that Data.GraphViz.Printing reflects the truth of the Dot format, but the real test would be whether dot accepts it. So I was considering writing a simple test harness that would attempt an export of a given escaped string and see whether dot complained or not, as part of developing a proper escape routine (and/or verifying the rules in Data.GraphViz.Printing)

snowleopard commented 6 years ago

@jmtd Good idea to check what others are doing!

I was surprised for example to see escLetters = Set.fromList ['N', 'G', 'E', 'T', 'H', 'L', 'n', 'l', 'r']. I wonder why they are escaped?

I guess if we have a string Hello\nWorld we don't want to mess with the \ but leave it as is? Not sure, this all is getting quite complicated ;)

Yes, we need to collect a dozen of examples/tests and make sure our translation agrees with dot.