unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.62k stars 254 forks source link

[Feature] Expose type contents/fields to help in testing/development (StyledParagraph, Table, TableCell, etc) #434

Open lilith-writes-code opened 3 years ago

lilith-writes-code commented 3 years ago

In trying to test logic related to PDF content building, we ran into issues with regards to verifying content. Additionally, the API for some renderable types don't seem consistent. For instance, the type creator.Paragraph has a public func Text() available to inspect/interrogate the contents of the paragraph. However, If I wanted to inspect the contents of creator.Table, or creator.StyledParagraph, etc, the content related data is private. If possible, it would be wonderful to just have access to say Cells in Table, with further access to each type's content or basic styling.

The main argument for this is to help ease development. If we want to verify we are building up the right datastructures before rendering them/writing them to disk, for instance in a unit test scenario, it would be much easier to simply verify that our logic worked by interrogating these types rather than trying to write an integration style test for each branch of logic we might need. For example, if I had code that should build up a table with different content depending on some boolean logic, the only thing I can inspect is the column count and the row count, I have no way to inspect the cells that were added or further the content in each one. Even if I hold a reference to the cell I use in my test scenario, I still can't get access to it's content.

We can work around some of this by testing content generation before adding it to a cell's content, or by avoiding types like creator.StyledParagraph for instance, but it would be a lot simpler if we just had access to the fields of the type. In some cases our only options are to write the pdf to disk and then do a visual diff, which means we need generated baseline pdf/images for each if statement in our code. The other option we've been exploring is to try to write to memory/disk and then read it in and use the extractor to inspect the tables content or other generated content. This approach is really just an arduous band-aid around the fact that we can't inspect the contents/fields of many of the types we are testing against.

Basically, it would be a huge help to just have a public 'getter' for many of the fields in these types. I know it might take a bit to try and add something like this to everything, so if you are interested in starting with a smaller subset, I could provide a list of types we're actively working with that would help us out immensely.

If I'm not making sense, I'd love to discuss further or show code examples for anything that might not be clear.

gunnsth commented 3 years ago

@venison It would be helpful if you could break this up into more specific parts and include some specific examples (with codes and images) so we can get.a better feel for the actual need here. Specific information would be good.

lilith-writes-code commented 3 years ago

Absolutely, I'll work on getting a few of the challenges we ran into and if we're lucky, you can just tell me where we need to look in case we just missed something. Will update when I have them ready!

lilith-writes-code commented 3 years ago

The easiest scenario I could probably quickly describe is verifying logic we have around building up content. An easy analogy would be, we have an if statement that if true would build out paragraphs based on input and put them in a table. At first we wanted to simply write a test that would verify that the resulting table had the proper row/columns (easy/supported) and then that it was built up with the proper content (not possible?).

We've since moved on to trying to intercept input at different points, say, just before any create table calls, or anything similar that does not seem to allow us to inspect the contents after building them. Essentially we end up having to wrap and mock out the implementation and can only verify that the mock method received an expected call with expected input.

Ideally, we would be able to specify input and then afterwards inspect the table's contents. Verify the first cell was blank, verify the third cell had these styled paragraphs with this text, etc. It looks challenging though, considering that TableCell's contents array only specifies VectorDrawable for an interface.

Here's a rough smattering of example pseudocode:

`func TestNewTable(t *testing.T) {
testObj := typeResponsibleForBuildingContent()

result := testObj.CreateTableWith("some awesome content, maybe images, maybe text, dynamically build table using it")

require.Equal(t, 10, result.Cols(), "Expected 10 columns")//ok...

}

func TestNewTableBetter(t *testing.T) {
anyContent := "some awesome content, maybe images, maybe text, dynamically build table using it" testObj := typeResponsibleForBuildingContent()

resultTable := testObj.CreateTableWith(anyContent)

require.Equal(anyContent, resultTable.GetContent()[0].GetText())//much better

} `

For what it's worth, we've gone in a direction of content 'buildup' similar to how your Invoice type holds onto/references its content in an attempt to be able to add our logic tests over those steps. There will still be gaps in areas like these where our only option is to wrap unipdf methods in an interface and simply mock/assert that we passed in the expected content but we have a bit of a path forward.

Case in point: The nice part about your basic Paragraph type is that we can create one, append to it and also later inspect the text in the instance. We didn't see a similar way to achieve this with StyledParagraph or Table.