Open pickettj opened 4 years ago
It seems impossible that you would end up trying to access group
if line_match
or num_match
is None
, but just in case, I think you should make those if
statements if num_match is not None
and if line_match is not None
.
Also for the second regex, I think you want ^[(0-9|\.)]*
instead.
@iamlemec
That doesn't seem to quite do the trick, but the following might help isolate the problem:
test_string = "5.23.5ud čē rāy pēš nē āmad"
num_pattern = re.compile(r"ud")
result = re.match(num_pattern, test_string)
print(result.group(0))
That code results in the following error message:
AttributeError Traceback (most recent call last)
in () 2 num_pattern = re.compile(r"ud") 3 result = re.match(num_pattern, test_string) ----> 4 print(result.group(0)) AttributeError: 'NoneType' object has no attribute 'group'
However, the regex code I used for the number works just fine:
test_string = "5.23.5ud čē rāy pēš nē āmad"
num_pattern = re.compile(r"^[0-9]{0,3}\.[0-9]{0,3}\.?[0-9]{0,3}")
result = re.match(num_pattern, test_string)
print(result.group(0))
Returns the number 5.23, as intended. But the weird part is that ud
isn't even a special character, it's just a simple string match, so I don't get it.
Doesn't shifting the carrot the way you suggested (^[(0-9|\.)]*
) turn that into a front anchor? I want to use it to exclude. When I tested it, [^(0-9|\.)]*
resulted in a match for everything but the initial line number, e.g. 5.23.5ud čē rāy pēš nē āmad (bolded signifying match)
For the first part, keep in mind that match
only matches things from the start of the string. You need to use search
if you want it to look anywhere in the string. So that would explain the behavior with "ud".
Yeah, I was confused about the ^
. You're right there. If you want to get the rest of the string though, just use the match object returned from the first regex. You can call group.end()
to get the end position of the numeric match and index the string from that point onward.
@iamlemec
I have a dictionary of texts consisting of a list of paragraphs. Those "paragraphs" all begin with a manually entered line number, e.g.:
I am attempting separate the number and the line into separate list elements. I have the first part of that working already:
However, extracting the lines themselves is not working (error message: "AttributeError: 'NoneType' object has no attribute 'group'"), even though I'm pretty sure my regex is fine. I believe the issue is my irregular characters (e.g. ud čē rāy pēš nē āmad), which requires some kind of special unicode instructions. But the solutions I'm finding do not seem to work. E.g. adding a unicode flag does not seem to work (
re.compile(r"hanger", re.UNICODE)
), and I think theu
flag may only be for Python2 (?) (re.compile(ur"hanger")
.Help?