idekerlab / dot-app

Cytoscape application for exporting to .dot file format
GNU General Public License v3.0
7 stars 4 forks source link

dot-app does not import UTF-8 files correct #19

Open StoltHD opened 4 years ago

StoltHD commented 4 years ago

Double Byte Characters like the Scandinavians do not get importert correct with dot-app ...

Can you please update the libraries/parser to someting that support to support UTF-8 ...?

bdtfitts commented 4 years ago

Hi there, Thanks for reaching out about your problem. I've taken a crack at this, and before I publish it to the Cytoscape app store I wanted to run it by you. I've attached a ZIP file that contains the updated version of dot-app that should fix the problem. I made some changes so that when importing and exporting DOT files, it should read them as UTF-8. dot-app-0.9.6.jar.zip

StoltHD commented 4 years ago

The UTF-8 Characters seems to work in this version. I had to convert one or two files to from UTF-8 to UTF-8 DOM in Notepad, but I'm not sure if that was just because the those files was "corrupt" or if it's dot-app, It take som time to generate new gv files, so I will test some more and if I find anything I will report back to you.

At the moment it seems that its working (its not a big problem to run a file in Notepad++ to enable DOM if I have to do that.

........ But I have another problem, I can't open GV files generated from a genealogy software named Gramps (this is reports written using graphviz), I get an error saying that it can't be parsed, and that I should try Neato. I have tried running Neoto and the file open just fine in Tulip, but not in dot-app. It might be that its because there are som newline (\n) characters in the labels, but I'm not sure yet. I need to investigate a little more about those files, and also create a few other reports, to see differences, and to compare with the gv files from other program that can be imported. The strange thing is that the files runs without problems in gv.edit and can be imported to Tulip. I will create a new issue when I figure out what casing the problem ... or when I have anonyminized the file so that I can give you a copy...

Thank you for fast response.

Jaran

søn. 2. aug. 2020 kl. 06:48 skrev Braxton Fitts notifications@github.com:

Hi there, Thanks for reaching out about your problem. I've taken a crack at this, and before I publish it to the Cytoscape app store I wanted to run it by you. I've attached a ZIP file that contains the updated version of dot-app that should fix the problem. I made some changes so that when importing and exporting DOT files, it should read them as UTF-8. dot-app-0.9.6.jar.zip https://github.com/idekerlab/dot-app/files/5011654/dot-app-0.9.6.jar.zip

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/idekerlab/dot-app/issues/19#issuecomment-667626916, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGWEGBZ6WQF6UIVTXJHLMNTR6TV25ANCNFSM4PHV3BWA .

bdtfitts commented 4 years ago

Regarding the newlines in the labels: Do the files from Gramps do newlines in the labels such as label="this is a\nmultiline label" or do they do label="this is a multiline label"? I ask because the issue seems to be a case of undefined behavior and how the developer of the parser ended up implementing the DOT Grammar. I've noticed with a few different GraphViz viewers that some allow input graphs to have linebreaks in label attributes and others do not. On The DOT Language page they specify that HTML like strings can use newlines for formatting, but they don't specify that same ability with double quoted strings. On that page they specify that double quoted strings can physically span multiple lines by putting \ before the newlines, but that has no effect on rendering the label as multiple lines. On the escString section of the Node, Edge, Graph Attributes Page, they mention that linebreaks can be added by using \n, \l, and \r for centered, left-justified, and right-justified with no mention of being able to add breaks to labels through physical linebreaks.

StoltHD commented 4 years ago

newline with \n and its Labels with double quotes, not html.

It would be great if you found a solution, but if not, I will be able to work around it, splitting the label to multiple columns in excel or Openrefine maybe, I need to read a little to figure out if its possible to have multiple data fields in graphviz... i.e. Dates, Places, and so on...

I really appreciate you looking at this and trying to find a solution

Jaran

søn. 2. aug. 2020 kl. 21:57 skrev Braxton Fitts notifications@github.com:

Regarding the newlines in the labels: Do the files from Gramps do newlines in the labels such as label="this is a\nmultiline label" or do they do

label="this is a multiline label"? I ask because the issue seems to be a case of undefined behavior and how the developer of the parser ended up implementing the DOT Grammar. I've noticed with a few different GraphViz viewers that some allow input graphs to have linebreaks in label attributes and others do not. On The DOT Language page http://graphviz.org/doc/info/lang.html they specify that HTML like strings can use newlines for formatting, but they don't specify that same ability with double quoted strings. On that page they specify that double quoted strings can physically span multiple lines by putting \ before the newlines, but that has no effect on rendering the label as multiple lines. On the escString section of the Node, Edge, Graph Attributes Page http://graphviz.org/doc/info/attrs.html#k:escString, they mention that linebreaks can be added by using \n, \l, and \r for centered, left-justified, and right-justified with no mention of being able to add breaks to labels through physical linebreaks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/idekerlab/dot-app/issues/19#issuecomment-667717658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGWEGB77IBZPVSUZSCL7V53R6XAK7ANCNFSM4PHV3BWA .

StoltHD commented 4 years ago

I have looked a little more at the problem with "unsupported" graphviz files... I looks like the problem is graphs with "subgraphs cluster".

I created two files from a addon to Twine called dotgraph, there I can define the same network graphs with and without subgraph clusters, the file without the subgraph cluster opens without problem, the files with subgraph clusters will not open.

The files from Gramps also have subgraph cluster, so I think that is the issue here.

I don't know if it's because you use an aged library, or what it is.

But the subgraph cluster in graphviz would be something like a node group or maybe subgraph in Cytoscape.

The way its used in in the export from Twine is most like a group based on TAG's (those TAG's also set a color that are imported correctly to Cytoscape).

I don't know if its possible for you, but maybe you could create a select box:

Will you convert subgraph cluster to: [ ] A Group [ ] Nested network [ ] Ignore all subgraph clusters

Or just create groups directly (Just add an infobox about it, maybe?) The clusters in Graphviz can use the attribute "Style=Invis", I dont know whats possible with groups in Cytoscape, but it seems that if its possible to set the Group attribute "Double-Click Action" and "Visualization for group", it would be enough for most people... the rest of any settings, attributes for groups can be set manually.

But the important thing is to just be able to import subgraph cluster from any graphviz file

Only thing is, if a node belong to multiple subgrpah clusters, the FDP layout do not seem to support that, but it seems that the rest do (just for information).

Hopefully this was for some help, if you want me to create a new issue for the subgraph cluster problem, just let me know.

Thanks again for looking into this.

StoltHD commented 4 years ago

Here are some links to examples explaining the subgraph and subgraph cluster feature in Graphviz. https://renenyffenegger.ch/notes/tools/Graphviz/elems/subgraph/index https://graphviz.org/Gallery/directed/cluster.html

StoltHD commented 4 years ago

I hope you will be able to find a fix to the problem with files with subgraphs not being supported... Do you want me to make new issue with it?

bdtfitts commented 4 years ago

Yes, go ahead and open a separate issue for the subgraph and I'll take a look into it