Closed watermarkhu closed 1 month ago
Hi @watermarkhu
This looks really interesting. At the moment I started tackling #44 and #222, and the pygments token output is just a mess to start parsing. I'll give a shot a see if it can replace pygments and then improve the functionality.
Regarding starting up a new auto-documenter, I can only tell how this domain was started. The original author basically built the documenter directly upon autodoc for Python. This gave them a good start and basis. However, the code is not the easiest to work with in my opinion. We still run into features that needs to be reimplemented, for instance #180. Even after maintaining the package for many years now, I still struggle with the Sphinx internals of Documenter and Directives 😵.
A different approach for autodoc is done in https://github.com/mozilla/sphinx-js. I hope this helps.
Good to hear!
I'm currently mostly struggling with setting up roles in a new domain in order to make cross-referencing possible eventually. Can we possibly setup a call?
I tried textmate-grammar-python
and looks way nicer with the tokenization (see https://github.com/sphinx-contrib/matlabdomain/issues/222#issuecomment-1991587364). Definitely makes it easier to deduce if it's method definition in an abstract class. Further, I really like the nested dictionaries, where I can just skip the body of a function, once I collected what I need.
It will require a lot of re-writing, but I'm quite tempted by it.
We can setup a call, but be warned I am by no means an expert in the cross-referencing. You can contact me at jorgen at cederberg dot be.
@watermarkhu Two comments to https://github.com/watermarkhu/textmate-grammar-python:
sphinxcontrib-matlabdomain
. I still support version 3.9.conan
version requirement of PyYAML>=6.0, <7.0
.Do you want me to add them as issues?
Good to see! Adding the issues would be great.
Let's discuss about 3.9 support on the PR that you submitted.
Hello, not to step on any toes here but I would like to know if this effort has stalled (understandably, time is always a valuable commodity)? The matlab library I am a maintainer of is currently going through a major documentation pass and to that end I have allocated some time to working on tooling. As such, I think this would be a good place to start as it will help in closing #52, #54, #212, and #222.
Those four issues are currently my target to get done (perhaps in one fell swoop along with this one) as they would be very useful for our documentation. I have started an attempt to implement classdef
class parsing working here and it seems like it should be doable to replace the current parsing code with something (at least marginally) better. Let me know if I have misread the situation.
Hi. It was definitely stalled. I have had zero time to work on this project unfortunately. This week, I'll give it a shot. The most difficult issue to solve is still #222.
If you want a starting point re: classes, I have now gotten most of a classdef parser written here https://github.com/apozharski/matlabdomain/blob/only-enums/sphinxcontrib/textmate_parser.py including argument blocks.
I am happy to continue work on it and submit a pr or you can pull out whatever is useful.
As an aside there are definitely some bugs in the textmate parser (https://github.com/watermarkhu/textmate-grammar-python/issues/66#issue-2405029475 for example), though after digging I suspect they are in the underlying grammar maintained by mathworks. I am currently looking at a possible fix for it though @watermarkhu may have the inner track on understanding the grammar format.
If you want a starting point re: classes, I have now gotten most of a classdef parser written here https://github.com/apozharski/matlabdomain/blob/only-enums/sphinxcontrib/textmate_parser.py including argument blocks.
I am happy to continue work on it and submit a pr or you can pull out whatever is useful.
Thanks. I will take this as a starting point. It looks very useful already. If you have any PR's, I'll work on the development branch dev-textmate-grammar-for-parsing
.
@apozharski regarding priority of docstrings, they are as follows:
@apozharski I ran into an issue with class attributes and created an issue https://github.com/watermarkhu/textmate-grammar-python/issues/67.
In the current parsing of classdef / method / property attributes I reuse the same method: https://github.com/sphinx-contrib/matlabdomain/blob/4d890d7e434b3b18541a7227c9329d7cd9e02184/sphinxcontrib/mat_types.py#L1474
@apozharski regarding priority of docstrings, they are as follows:
* properties, enums, (events): Comments before the property have higher precedence, than a trailing comment. However, there cannot be empty lines before the property. * functions and classes: always after the function or classdef line.
Yep that is what I thought was the case. Thanks for clarifying. I will do some cleanup and get the routines to check for non-consecutive comments and submit a PR to your dev branch.
@joeced After spending a few too many hours trying to fix the mathworks provided textmate grammar I am convinced that it is not worth continuing to force a square peg, a parsing system primarily designed for syntax highlighting, into the round hole that is using it for extracting structure. After doing some research there is a better alternative that is https://github.com/acristoffers/tree-sitter-matlab which is a matlab grammar for tree-sitter
which uses a "proper" LR parser and produces a much more usable AST. It also does not have the seeming performance downsides: https://github.com/watermarkhu/textmate-grammar-python/issues/68.
Over the last couple days I quickly threw together a working prototype with support for I believe the full suite of matlab syntax (argument
blocks, enumeration
blocks, events
blocks etc.): https://github.com/apozharski/matlabdomain/blob/tree-sitter-dev/sphinxcontrib/mat_tree_sitter_parser.py
I think this is the direction this project should go in as it does not require us to fix yet more bugs in MATLAB-textmate-grammar
, and it already supports the full feature-set we need. The one concern is that while tree-sitter
is available on pyPI tree-sitter-matlab
is not yet. I have reached out to the developer via issue https://github.com/acristoffers/tree-sitter-matlab/issues/12 and they seem receptive to packaging it for pyPI.
Hi @apozharski, thank you very much for looking into this. Taking the time and effort. Much appreciated! I'm away from a computer at the moment, but will get back to you in 2 weeks.
hi @apozharski - I'm back now!
Did you have any time to work on using tree-sitter-matlab and would you try to make a pull request? The most important thing for me is not what parsing library is used, but that it is better than the existing, where better equals:
Again - thanks for looking into this!
Hello @joeced, yes I had been sidelined last week on this due to some urgent work. I have a branch with a 90-95% working parser and I will open a separate PR with it. In general I think it has simplified the code significantly however I would love to hear your feedback.
The latest work I have done on this is slowly beginning to fix things to get the tests back in working order (and I have found a bug in tree-sitter-matlab
which I have a PR for there now.
Absolutely no problem!
This is resolved by the move to the tree-sitter backend #261.
Hi @joeced, great work on maintaining this repo.
A year ago, I wanted to contribute to support argument blocks. However, I've found that the logic in
mat_types.py
based on the Pygments tokens to be very hard to work with, and a bit unstable.Following MathWorks' support for VSCode, I had started on working a parser based on TextMate grammars using Python, which is used for syntax highlighting in VSCode. MathWorks is now also maintaining the MATLAB grammar.
The package is available at https://github.com/watermarkhu/textmate-grammar-python. If you are interested, I think this can be a good replacement for the currently in-house parsing of
matlabdomain
. The benefit of using TextMate grammar is that 1) due to its nested nature, the output is already a syntax tree and 2) parsing is now officially supported by MathWorks and the contributors of the VSCode extension.On a different topic, due to some requirements, I will need to have an auto-documenter that is compatible with markdown docstrings. To this end, I've already started work on a new extension that is dependent on the myst-parser and based on autodoc2. I would love to get in touch with you to understand the
matlabdomain
better to see what I can re-use.