miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
818 stars 118 forks source link

Add line numbers on all block tokens during parsing (#144) #188

Closed anderskaplan closed 10 months ago

anderskaplan commented 1 year ago

This is a partial solution for #144 which records line numbers on all block tokens. The span tokens are still left unlabeled, but at least it's a starting point, and I think the functionality added here is useful by itself.

Open question: should the line_number be a added to repr_attributes, so that it gets included in the output from ASTRenderer?

coveralls commented 1 year ago

Coverage Status

coverage: 94.15% (+0.03%) from 94.118% when pulling 6070d7e670dd29e7dc6e1b14ad732439b8c3ffe3 on anderskaplan:line-numbers-2 into ee7ce94466966d083d02379181c3e4877988df60 on miyuchina:master.

pbodnar commented 1 year ago

@anderskaplan, thank you, I will try to look at this for the next release.

pbodnar commented 1 year ago

Finally, I have found time for mistletoe once again. :)

@anderskaplan, I must say "good job" 👍 . Basically, I like it, including the clever way of the new unit test.

Probably just 3 more things be resolved:

  1. Performance - did you run the benchmark test? I guess there should be little to zero difference to the version without line numbers, right?
  2. Backwards compatibility - shouldn't we better pass the line_number parameter as the last AND optional (with default value of None) to the constructors of TableRow etc. (even Table?)? This way, we could achieve backwards compatibility for those users who might create (some) tokens without actually parsing a markdown text. Yes, there probably won't be many, but what if there are some out there? :)
  3. Documentation - probably add some notes on this cool feature as a sub-chapter to https://github.com/miyuchina/mistletoe/blob/v1.2.1/dev-guide.md#understanding-the-ast?

Open question: should the line_number be a added to repr_attributes, so that it gets included in the output from ASTRenderer?

Not sure here, but probably yes, it could? :)

anderskaplan commented 11 months ago

Hey @pbodnar, just checking in to say hi. The last weeks have been quite busy with other things but I'll get back to this as soon as I can.

anderskaplan commented 10 months ago

I have updated the documentation, made the line_number parameter optional, and run the benchmark as requested in the review.

The benchmark results have a fairly large variance (5 %CV) from run to run on my machine. But I'm fairly confident that the performance penalty of the line numbers is negligible, because if I compare the mean benchmark with and without line numbers from five runs of each type, then the relative difference is about 0.1%. I.e., far less than the run-to-run variance.

I also added the line_number attribute to repr_attributes, so that the line numbers are included in the output from the ASTRenderer.