jfeliu007 / goplantuml

PlantUML Class Diagram Generator for golang projects
MIT License
1.84k stars 172 forks source link

Repeatable output (idempotent output) #61

Closed Justin-W closed 4 years ago

Justin-W commented 4 years ago

goplantuml should generate identical output when repeatedly executed against identical inputs/targets.

For example, when running the following 3 commands, the diff should detect no differences.

goplantuml -recursive . > goplantuml.a.puml
goplantuml -recursive . > goplantuml.b.puml
diff goplantuml.a.puml goplantuml.b.puml

However, currently the diff detects many differences between the 2 files.

Such non-idempotent output makes the output less usable for a variety of purposes. For example:

And if there is some reason or benefit from the current behavior (i.e. randomized output), then a new CLI flag could be added to give the user a choice between idempotent vs randomized output. However, I think that idempotent output should be the default behavior.

jfeliu007 commented 4 years ago

@Justin-W In order to speed up the process of creating the diagram, the project uses maps. Due to the nature of the randomness of maps in Golang, it is very difficult to maintain an order. Maps are great for O(1) operations, but they have that particularity in Golang. Other languages mimic the same patterns when using maps, but I believe Golang intentionally makes them random. There are so many pieces in a big project that I rather keep this using maps for the sake of speed. I might consider this in the future, but I would have to come up with a way of consistently changing a map into an array (Maybe alphabetically, before rendering. But it will definitely be for a much later update :).

@Justin-W , you are giving me great feedback. I am glad this tool is useful to some people.

Justin-W commented 4 years ago

the project uses maps. Due to the nature of the randomness of maps in Golang, it is very difficult to maintain an order. ... I would have to come up with a way of consistently changing a map into an array (Maybe alphabetically, before rendering. ...

I don't think so. I think you would just need to modify a few funcs slightly. For example, in the renderStructures func (goplantuml/parser/class_parser.go:263), instead of doing for name, structure := range structures { ... }, you would simply modify the for loop to iterate over a slice of the keys (sorted by name), and then call the renderStructure func as you currently do. Code example

I suspect you would only need to do this in a handful of places to get fully idempotent output.

jfeliu007 commented 4 years ago

Hmmm. That could work. And We only need to implement it on the render functions.

Good tip. Thanks. I’ll try that.

jfeliu007 commented 4 years ago

Brilliant. I am reducing the amount of testing support files I need for this project.

Justin-W commented 4 years ago

FYI: #61 isn't completely fixed yet, per latest tests of master.

The following functions still aren't producing repeatable output:

jfeliu007 commented 4 years ago

renderStuctMethofs, and renderStructFields will render the methods and fields in order of appearance. They are already handled by arrays. I like that the fields and methods appear in the same order they appear in the code. At least that's what I have generated consistently with the code I have ran the parser with. As for render aliases, I do see that I am missing the sort function. I will reopen this issue for that one. But for the other two, I am not so sure.

Justin-W commented 4 years ago

renderStuctMethofs, and renderStructFields will render the methods and fields in order of appearance

I'm still seeing the order of class methods vary between consecutive runs. I just assumed fields was also a problem.

I like that the fields and methods appear in the same order they appear in the code.

I think that is a perfectly acceptable default order for fields and methods. But it's not the behavior I'm currently seeing.

jfeliu007 commented 4 years ago

I'm still seeing the order of class methods vary between consecutive runs. I just assumed fields was also a problem.

That's is very interesting. I can't see anything like that in what I am running :( .

You can see the example in the Generated Diagram on the read me. You will see the parser's order of methods is the same as then ones in the class itself. Notice that the Functions and Fields in the Struct struct are actually arrays and they are filled in order of appearance. Maybe I am running and outdated version of the ast library, and now they do not guarantee the order of the fields and methods?

Justin-W commented 4 years ago

Getting closer. Alias edges seem to be fixed now, but I'm still seeing 2 issues RE: method order and type values.

Non-repeatable method order

For consecutive tests against the same pacakge, I am seeing the order of certain public and private methods vary each run. For example, these 2 goplantuml output excerpts are from the puml class for the same struct:

    - consumeResourceCacheEvents() 
    - mqSetup() error

    + StartMQ() 
    ' ... (all public methods except StartMQ)
    - mqSetup() error
    - consumeResourceCacheEvents() 

    + StartMQ() 
    ' ... (all public methods except StartMQ)

Note how the 2 private methods get swapped, and how the StartMQ method swaps between being the first vs last public method listed (despite all of the other 10+ public methods seeming to be in the same order every time).

I noticed that the StartMQ and consumeResourceCacheEvents funcs are both defined in a different .go file from the rest of the method funcs.

So I suspect the non-repeatable output is because you are processing the go files within a package in random order each run.

Type value inconsistency

This same project uses protobufs, and I am seeing the 'type' of many fields and method parameters vary between runs.

For example, here are some excerpts from one of our .go files:

import "github.com/golang/protobuf/ptypes"
import "aaa.com/bbb/ccc/pb" //a custom proto package
//...
type FakeProvider struct {
//...
    GetFileResponse *pb.File
//...
}

And here are corresponding puml output excerpts from 2 consecutive runs:

class FakeProvider << (S,Aquamarine) >> {
'...
    + GetFileResponse *proto.File
class FakeProvider << (S,Aquamarine) >> {
'...
    + GetFileResponse *pb.File

Note the same Field has a different 'type' in each run. I'm seeing this flip-flopping in the type's prefix (always between *pb. and *proto.) for many different field and parameter types. However, while separate blocks of the puml have different types between runs, consecutive lines in the same 'block' of methods usually seem to have the same prefix in a given run. For example, it seems like all of the methods in a single class will consistently use either one or the other of the 2 prefixes in a single run, but some other classes may use the other prefix in that same run.

I don't know if the root cause is in goplantuml vs ast vs other, but it's definitely causing non-repeatable output. And I couldn't find any evidence of different import aliases of the same package in the code I was scanning, so I don't think that the issue is caused by an inconsistency within the scanned package's code.

Justin-W commented 4 years ago

More info:

func (st *Struct) AddMethod(method *ast.Field, aliases map[string]string) is the func which appends items to the array which determines the order in which class methods get rendered. That func is called from func (p *ClassParser) parseFileDeclarations(node ast.Decl) within func (p *ClassParser) parsePackage(node ast.Node), which iterates over an unsorted map of the package's files.

Therefore, any package with a struct that has methods defined in more than 1 go file will render as a puml class with methods in a non-repeatable order.

Since you are want to render the class methods in the order they are defined in the code, and since rendered output is affected by the order in which a package's go files are parsed, you therefore need to parse the go files in a repeatable order within func (p *ClassParser) parsePackage(node ast.Node).

Justin-W commented 4 years ago

you therefore need to parse the go files in a repeatable order within func (p *ClassParser) parsePackage(node ast.Node)

Note that this will also partially fix #84.

Justin-W commented 4 years ago

@jfeliu007 Could you re-open this issue until it's fully fixed? Thanks.

jfeliu007 commented 4 years ago

@Justin-W it's been a while since I have looked at these issues. I will work on this one now. This has been bugging me for some time but I have been busy with other projects. I am going to go with the suggestion of parsing the files in a specific order within each package. That is a very good catch.

jfeliu007 commented 4 years ago

I also think there might be other inconsistencies in the project regarding this very issue. I'll keep an eye on anything that pops up.

Justin-W commented 4 years ago

Thanks @jfeliu007 :)

jfeliu007 commented 4 years ago

I think I have fixed this for good. I used the elasticsearch library which would usually provide different results and now it seems to be very consistent. I Discovered that some classes we generated with names that did not conflict with the plantuml specifications (due to having dots in the names) where also being rendered based on a map. I also added the solution to the functions that you mentioned. I never thought of that :) .