miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
818 stars 118 forks source link

Change FileWrapper anchor/reset to get_pos/set_pos #186

Closed anderskaplan closed 1 year ago

anderskaplan commented 1 year ago

Issue found during implementation of #166, when checking for the start of Table tokens.

When checking if a table starts on a line, one needs to scan multiple lines without consuming any content. The way to do this is with the anchor/reset methods on the FileWrapper class. However, this breaks the parsing of List tokens, because these also use anchor/reset, and the anchor set by List gets overwritten by the one set in Table.

The proposed solution is to not store the anchor position on the FileWrapper, but instead move that responsibility to the parsing method. This way, recursive parsing can hold multiple anchors at the same time, without any interference.

coveralls commented 1 year ago

Coverage Status

coverage: 94.044% (+3.0%) from 91.086% when pulling e058e78979c7357cf1afa00c8042c7bdb976cde6 on anderskaplan:filewrapper-anchoring into 86f3a2f2d6d6289114403bd637fffdd7b543cd62 on miyuchina:master.

anderskaplan commented 1 year ago

@pbodnar Please disregard the test coverage check, it's a false alarm caused by deleting one line of "tested" code.

pbodnar commented 1 year ago

Another thought: What about instead of replacing the existing anchor/reset methods making them stacked (a simple list would probably suffice for the stored anchors)? Would that make sense to you?

Also, we should expect that some could use the existing methods in their custom tokens, so that is another reason for not removing them right away.

anderskaplan commented 1 year ago

Another thought: What about instead of replacing the existing anchor/reset methods making them stacked (a simple list would probably suffice for the stored anchors)? Would that make sense to you?

Yes, I did consider that option, I even tried implementing it :) However, that option assumes that a push is always followed by a pop, and that is not the case here. Sure, one could force the calling code to clean up (i.e. pop) after itself, but that becomes more complex and error prone than just letting the calling code keep track of the stored position.

Also, we should expect that some could use the existing methods in their custom tokens, so that is another reason for not removing them right away.

It could happen, but I'd guess that the risk is rather small. Imho this is one of those band-aids that just need to be pulled. We could keep the anchor/reset methods for now, though, and mark them as deprecated. Which do you prefer?

pbodnar commented 1 year ago

Another thought: What about instead of replacing the existing anchor/reset methods making them stacked (a simple list would probably suffice for the stored anchors)? Would that make sense to you?

Yes, I did consider that option, I even tried implementing it :) However, that option assumes that a push is always followed by a pop, and that is not the case here. Sure, one could force the calling code to clean up (i.e. pop) after itself, but that becomes more complex and error prone than just letting the calling code keep track of the stored position.

OK, I would expect that push and pop would by always stacked (nested). But anyway yes, it is definitely easier this way, even if we need to introduce a new variable (anchor) in every such block.

Also, we should expect that some could use the existing methods in their custom tokens, so that is another reason for not removing them right away.

It could happen, but I'd guess that the risk is rather small. Imho this is one of those band-aids that just need to be pulled. We could keep the anchor/reset methods for now, though, and mark them as deprecated. Which do you prefer?

I prefer to keep them and mark them as deprecated, please. :)