Breaking Changes Version 0.60.0 release

vsch commented 5 years ago

:warning: Release of 0.60.0 has breaking changes due to re-organization, renaming and clean up of some implementation classes.

Please give feedback on changes if are not able to resolve your code to the changes.

Break: split out generic AST utilities from flexmark-util module into separate smaller modules. IntelliJ IDEA migration to help with migration from 0.50.40 will be provided where needed if the package or class is changed. com.vladsch.flexmark.util will no longer contain any files but will contain the separate utilities modules with flexmark-utils module being an aggregate of all utilities modules, similar to flexmark-all
- ast/ classes to flexmark-util-ast
- builder/ classes to flexmark-util-builder
- collection/ classes to flexmark-util-collection
- data/ classes to flexmark-util-data
- dependency/ classes to flexmark-util-dependency
- format/ classes to flexmark-util-format
- html/ classes to flexmark-util-html
- mappers/ classes to flexmark-util-sequence
- options/ classes to flexmark-util-options
- sequence/ classes to flexmark-util-sequence
- visitor/ classes to flexmark-util-visitor
Convert anonymous classes to lambda where possible.
refactor flexmark-util to eliminate dependency cycles between classes in different subdirectories.
Break: delete deprecated properties, methods and classes
Add: org.jetbrains:annotations:15.0 dependency to have @Nullable/@NotNull annotations added for all parameters. I use IntelliJ IDEA for development and it helps to have these annotations for analysis of potential problems and use with Kotlin.
Break: refactor and cleanup tests to eliminate duplicated code and allow easier reuse of test cases with spec example data.
Break: move formatter tests to flexmark-core-test module to allow sharing of formatter base classes in extensions without causing dependency cycles in formatter module.
Break: move formatter module into flexmark core. this module is almost always included anyway because most extension have a dependency on formatter for their custom formatting implementations. Having it as part of the core allows relying on its functionality in all modules.
Break: move com.vladsch.flexmark.spec and com.vladsch.flexmark.util in flexmark-test-util to com.vladsch.flexmark.test.spec and com.vladsch.flexmark.test.util respectively to respect the naming convention between modules and their packages.
Break: NodeVisitor implementation details have changed. If you were overriding NodeVisitor.visit(Node) in the previous version it is now final to ensure compile time error is generated. You will need to change your implementation. See comment in the class for instructions.

:information_source: com.vladsch.flexmark.util.ast.Visitor is only needed for implementation of NodeVisitor and VisitHandler. If you convert all anonymous implementations of VisitHandler to lambdas you can remove all imports for Visitor.
- Fix: remove old visitor like adapters and implement ones based on generic classes not linked to flexmark AST node.
- Deprecate old base classes:
- com.vladsch.flexmark.util.ast.NodeAdaptedVisitor see javadoc for class
- com.vladsch.flexmark.util.ast.NodeAdaptingVisitHandler
- com.vladsch.flexmark.util.ast.NodeAdaptingVisitor

vsch commented 4 years ago

Major improvements in performance and memory requirements for SegmentedSequence the work horse of the library.

Here is a brief summary of progress:

Major reorganization and code cleanup of implementation for next version 0.60.0

Formatter implementation is now part of core implementation in flexmark module
Formatter improved with more options including wrapping text to margins.
added ability to track and map source offset(s) to their index in formatted sequence. This feature allows editor caret position preservation across formatting operation.
Offset tracking unified using TrackedOffset. Used by MarkdownParagraph for text wrapping and MarkdownTable for table formatting and able to handle caret position during typing and backspace editing operations which are immediately followed by formatting or the edited source.
Tests cleaned up to eliminate duplication and hacks
flexmark-test-util made reusable for other projects. Having markdown as the source code for tests is too convenient to have it only used for flexmark-java tests.
Optimized SegmentedSequence implementation using binary trees for searching segments and byte efficient segment packing. Parser performance is either slightly improved or not affected but allows using SegmentedSequences for collecting Formatter and HtmlRenderer output to track source location of all text with minimal overhead and double the performance of old implementation.
new implementation of LineAppendable used for text generation in rendering:
can use SequenceBuilder to generate BasedSequence result with original source offsets for those character segments which come from the source. This allows round trip source tracking from Source -> AST -> Formatted Source -> Source throughout the library.

As an added bonus using the appendable makes formatting to it 40% faster than previous implementation and 160 times (yes times) more efficient in memory use. For the test below, old implementation allocated 6GB worth of segmented sequences, new implementation 37MB. The % overhead is four times greater but that is after a 43 fold reduction in total overhead bytes. Old implementation allocated 342MB of overhead, new implementation only 8MB.

As a result of increased efficiency, two additional files of about 600kB each can be included in the test run and only add 0.6 sec to the total formatter execution time and only 7.5MB of additional memory.

Tests run on 1141 markdown files from GitHub projects and some other user samples. Largest was 256k bytes. The two new files of 600KB were not included in the results to allow comparing them to previous implementation.

Description	Old SegmentedSequence	New Segmented Sequence	New LineAppendable
Total wall clock time	13.896 sec	9.672 sec	8.805 sec
Parse time	2.402 sec	2.335 sec	2.352 sec
Formatter appendable	0.603 sec	0.602 sec	0.798 sec
Formatter sequence builder	7.264 sec	3.109 sec	1.948 sec

The overhead difference is significant. The totals are for all segmented sequences created during the test run of 1141 files. Parser statistics show requirements during parsing and formatter ones are only for formatting of them while accumulating the text as a segmented sequence.

Description	Old Formatter	New Formatter	New LineAppendable	Old Parser	New Parser
Bytes for characters of all segmented sequences	6,029,774,526	6,029,774,526	37,253,492	917,016	917,016
Bytes for overhead of all segmented sequences	12,060,276,408	342,351,155	8,021,677	1,845,048	93,628
Overhead %	200.0%	5.7%	21.5%	201.2%	10.2%

vsch commented 4 years ago

Version 0.60 released.

cjbrooks12 commented 4 years ago

When updating to 0.60, it looks like the flexmark-ext-gfm-tables artifact has not been released for this version. It seems like the table functionality still works, however; was that feature rolled into the core library? I didn't see anything about that in the 0.60 release notes or migration guide.

vsch commented 4 years ago

@cjbrooks12, gfm-tables extension has been deprecated for a long time and was not being updated. The flexmark-ext-tables module is a superset of the gfm module and will perform table parsing compatible with GFM by setting the module options:

                .set(TablesExtension.COLUMN_SPANS, false)
                .set(TablesExtension.APPEND_MISSING_COLUMNS, true)
                .set(TablesExtension.DISCARD_EXTRA_COLUMNS, true)
                .set(TablesExtension.HEADER_SEPARATOR_COLUMN_MATCH, true)

lread commented 2 years ago

@vsch I found your note above on gfm-tables very helpful, thanks!

To match GitHub, I am also including the settings I found here, is that appropriate? This adds:

                .set(TablesExtension.WITH_CAPTION, false)
                .set(TablesExtension.MIN_HEADER_ROWS, 1)
                .set(TablesExtension.MAX_HEADER_ROWS, 1)

garretwilson commented 1 year ago

It appears that SuperscriptExtension has been moved from com.vladsch.flexmark.superscript.SuperscriptExtension to com.vladsch.flexmark.ext.superscript.SuperscriptExtension. Should this be mentioned in the list of breaking changes? (It broke my build anyway.) And is there an explanation as to why it was moved? Does .ext. mean this is something outside of the CommonMark specification? (I'm only guessing. An official explanation would be helpful.)

vsch / flexmark-java

Breaking Changes Version 0.60.0 release #370