Closed giuliohome closed 4 months ago
Notice that a rudimentary approach to implement this check on Windows is to open the file without using UTF-8, thus triggering the exception mentioned here.
While it seems unlikely we'd ever add the ability to fail on valid documents for arbitrary reasons (that's arguably the job of a linter), it's not hard to implement most any kind of rules like that you want by pre-inspecting or intercepting the token stream yourself:
import re
import yaml
from yaml.tokens import ScalarToken
doc = """
double: "mar"
single: 'mar'
none: mar
leftright: “mar”
"""
# inspect pre-scan
curly_quoted_scalars: list[ScalarToken] = [tok for tok in yaml.scan(doc) if isinstance(tok, ScalarToken) and re.search('[“”]', tok.value)]
for tok in curly_quoted_scalars:
print(f'found curly-quote scalar at {tok.start_mark}')
# bake it into a custom loader
class NoCurlyQuotesSafeLoader(yaml.SafeLoader):
def get_token(self):
tok = super().get_token()
if isinstance(tok, ScalarToken) and re.search('[“”]', tok.value):
raise ScannerError(problem="this loader does not allow curly-quoted scalars", problem_mark=tok.start_mark)
return tok
print(yaml.load(doc, Loader=NoCurlyQuotesSafeLoader))
@nitzmahone Excellent! Thank you very much!!!
Is it possible to raise an exception when parsing a yaml file that is using left double quotation mark (
“
) instead of straight double quotation marks ("
) as string delimiters? (also somehow related)Thanks