Closed PekaryGergelyR closed 5 years ago
Found the problem, a regular expression in MatObject._rename_function_variables
seems to be the issue.
Actually the original does not even work as if I add the following docstring % Constructor functionVar functionVar
, then it will be rendered as Constructor functionVar funcVar
.
I think there is no point creating a regex for this kind of renaming as it could waaaay hard and complicated.
Instead there is a hack solution, rename every occureance of function[^\s]+
with a unique id and rename it again after the pygments lexer is finished. This way only the actual function properties will be left for the lexer. What do you think?
Okay I hacked together a solution, so for this Matlab code:
function obj = ClassWithFunctionVariable(the_functions)
% Constructor functionVar functionVar
% function[
% function(
% function;
% function
this = ['%', 'functionVar'] % functionVar
% Determine the name and M-file location of the function handle.
functionHandleInfo = functions(testFcn);
self.Name = functionHandleInfo.function;
if strcmp(functionHandleInfo.type, 'anonymous')
% Anonymous function handles don't have an M-file location.
self.Location = '';
else
self.Location = functionHandleInfo.file;
end
end
I created this:
@staticmethod
def _rename_function_variables(code):
"""
Rename variables in code starting with function*, as Pygments parses
this incorrectly.
:param code:
:type code: str
:return: Code string without function-variable names.
"""
key = 'SPHINX_MATLAB_FUNCTION_TAG'
# Find any function occurence where either:
# - there is something ahead which is not \s or \t
# - or there are only \s \t ahead but the following character is not \s or \t
matcher = re.compile('^(?P<p>(?P<s1>.*?[^\s\t].*?)function|(?P<s2>[\s\t]*)function(?P<e2>[^\s\t]))', flags=re.MULTILINE)
match = matcher.search(code)
while match:
[start, end] = [match.group('s1'), ''] if match.group('s1') else [match.group('s2'), match.group('e2')]
code = re.sub(re.escape(match.group('p')), '{}{}{}'.format(start, key, end), code)
match = matcher.search(code)
return code
And after the tokenization the keys must be replaces again:
# The word "function*" cannot be used anywhere
# encrypt function variables
code = MatObject._rename_function_variables(code)
tks = list(MatlabLexer().get_tokens(code)) # tokenenize code
# decrypt function variables
tks = [(token, value.replace('SPHINX_MATLAB_FUNCTION_TAG', 'function')) for token, value in tks]
Might not be the best solution, but it is one. This also solves my original issue.
Thanks you very much. It looks better than the original solution. I will look into it tomorrow.
I've now investigated a bit more. It looks like I can actually delete the function _rename_function_variables
. With the 1a50005ab492415ca99028e24235986111a7f84a commit I changed how the parsing works (in general less likely to crash on invalid MATLAB code). It apparently solved the problem with variables named function*
. But I'll test some more, before committing.
That would be nice! However, I do not understand I am originated from that commit and it does not work for me.
I looked into the pygments lexer and just one line has to be modified:
(r'^\s*function', Keyword, 'deffunc'),
to (r'^\s*function\s', Keyword, 'deffunc'),
I created an issue for it on the bitbucket page: https://bitbucket.org/birkenfeld/pygments-main/issues/1491/matlablexer-cannot-handle-literal-function
Thank you so much for filing the issue, let us cross fingers its taken care of. I'll will still look more into this, to make the file parsing more robust.
Aaaaaand I was wrong again sorry.
The "correct" way to fix the pygments is to use the following (r'^\s*function(?=[\s[])', Keyword, 'deffunc')
As a function could be in the format of function[a,b,c]=f_no_spaces(x,y,z)
and the character must not be consumed.
Eventually this will not solve the problem as the function
token will be marked as a Token.Keyword
nonetheless.
The solution might be to check whether it is followed by a Token.Name.Function
or not.
What do you think?
I think an easy solution might be, to check that the function keyword must be preceded by a new line and it must be followed by either \s or [:
@staticmethod
def _fix_function_variables(tokens):
for idx, token in enumerate(tokens):
temp_token = (token[0], token[1].lstrip())
if temp_token == (Token.Keyword, 'function'):
previous_token = tokens[idx-1] if idx > 0 else None
if previous_token and not previous_token == (Token.Text, '\n'):
# This is not a real function as it is not after a new line
tokens[idx] = (Token.Name, token[1])
continue
next_token = tokens[idx+1] if idx < len(tokens)-1 else None
if next_token and not re.match('[\s[]', next_token[1]):
# This is not a real function as it is not followed by '\s' or '['
tokens[idx] = (Token.Name, token[1])
continue
return tokens
For example:
First I thought it is an endless cycle, but eventually it will finish, maybe a warning for this ill formatted code?