Closed simonw closed 4 years ago
I just hit this trying to run against https://fivethirtyeight.datasettes.com/fivethirtyeight.db
:
File "/Users/simon/.local/share/virtualenvs/datasette-graphql-n1OSJCS8/lib/python3.8/site-packages/graphql/type/definition.py", line 180, in __init__
assert_valid_name(name)
File "/Users/simon/.local/share/virtualenvs/datasette-graphql-n1OSJCS8/lib/python3.8/site-packages/graphql/utils/assert_valid_name.py", line 10, in assert_valid_name
assert COMPILED_NAME_PATTERN.match(
AssertionError: Names must match /^[_a-zA-Z][_a-zA-Z0-9]*$/ but "./index" does not.
That's a complaint about https://fivethirtyeight.datasettes.com/fivethirtyeight/.%2Findex
So regex is ^[_a-zA-Z][_a-zA-Z0-9]*$
This happens when you try to use GraphQL against the Datasette fixtures.db
database:
Names must match /^[_a-zA-Z][_a-zA-Z0-9]*$/ but "123_starts_with_digitsCollection" does not.
So the rules are a little different from those handled by the to_css_class
method in Datasette utils, since that doesn't have a rule about names starting with a non-digit: https://github.com/simonw/datasette/blob/7702ea602188899ee9b0446a874a6a9b546b564d/datasette/utils/__init__.py#L575-L599
css_class_re = re.compile(r"^[a-zA-Z]+[_a-zA-Z0-9-]*$")
css_invalid_chars_re = re.compile(r"[^a-zA-Z0-9_\-]")
def to_css_class(s):
"""
Given a string (e.g. a table name) returns a valid unique CSS class.
For simple cases, just returns the string again. If the string is not a
valid CSS class (we disallow - and _ prefixes even though they are valid
as they may be confused with browser prefixes) we strip invalid characters
and add a 6 char md5 sum suffix, to make sure two tables with identical
names after stripping characters don't end up with the same CSS class.
"""
if css_class_re.match(s):
return s
md5_suffix = hashlib.md5(s.encode("utf8")).hexdigest()[:6]
# Strip leading _, -
s = s.lstrip("_").lstrip("-")
# Replace any whitespace with hyphens
s = "-".join(s.split())
# Remove any remaining invalid characters
s = css_invalid_chars_re.sub("", s)
# Attach the md5 suffix
bits = [b for b in (s, md5_suffix) if b]
return "-".join(bits)
I need something a bit different here.
Easy option: strip out any characters not in [_a-zA-Z0-9]
and, if it doesn't start with a letter or underscore, prefix it with an underscore.
Need to also check that it doesn't collide with the name of another table that has undergone the same transformation. For that, can add a suffix of _
or _2
or similar.
I need to apply the same rules to column names AND to those tablename_list
nested fields too.
I think a Namer
class which I instantiate and then feed raw values to get back valid distinct names that haven't been used in that context yet.
class Namer:
def __init__(self):
self.names= set()
def name(self, value):
value = strip_invalid_chars(value)
if not value:
value = "_"
if value[0].isdigit():
value = "_" + value
suffix = 2
orig = value
while value in self.names:
value = orig += "_" + str(suffix)
suffix += 1
self.names.add(value)
return value
From a code point of view the column names will be the fiddliest bit, since they're used in a bunch of different ways. They only need to be unique with respect to their table though - no need to generate id
and then id_2
etc for uniqueness across multiple different tables.
What makes for a valid GraphQL identifier?