Why is this Regex not working?

vezinaca / Banq_Achat

Divers scripts qui interagissent avec la BANQ

0 stars 0 forks source link

Why is this Regex not working? #39

Closed vezinaca closed 4 years ago

vezinaca commented 4 years ago

str_html = "<td>85. Portishead’s <i>Dummy</i><br/>by RJ Wheaton<br/>Buy from Bloomsbury: <a href="
    my_id = re.search('<td>(.*)\. ', str_html)
    print("id:_" + my_id.group(1))

Traceback (most recent call last):
  File "test_regex.py", line 12, in <module>
    print("id:_" + my_id.group(1))
AttributeError: 'NoneType' object has no attribute 'group'

vezinaca commented 4 years ago

str_html_working = "<td>142. The Wild Tchoupitoulas’ <em>The Wild Tchoupitoulas</em><br/>by Bryan Wagner"
str_html_not_working = "<td>85. Portishead’s <i>Dummy</i><br/>by RJ Wheaton<br/>Buy from Bloomsbury: <a href="

    my_id_85 = re.search('<td>(.*)\. ', str_html_not_working)
    if my_id_85 != None:
        print("id:_" + my_id_85.group(1))
    else:
        print("no regex found")

    my_id_142 = re.search('<td>(.*)\. ', str_html_working)
    if my_id_142 != None:
        print("id:_" + my_id_142.group(1))
    else:
        print("no regex found")

vezinaca commented 4 years ago

little_dots

vezinaca commented 4 years ago

I see this little white dot between the '142.' and 'The' which I don't see on the line after the '85.' and 'Portishead'. Could that be it?

gregsadetsky commented 4 years ago

The dot means “space”. Maybe there’s a tab on the line with 85 and a space on the other line

But that’s not the problem. The problem is that .* is too greedy of a pattern — you need to find the pattern that allows to match digits only

Start by matching only to make sure it works, then move on to and digits

On Feb 1, 2020, at 9:53 AM, vezinaca notifications@github.com wrote:

I see this little white dot between the '142.' and 'The' which I don't see on the line after the '85.' and 'Portishead'. Could that be it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

gregsadetsky commented 4 years ago

there are many sites like this, but you can try https://www.regexpal.com/ to see how your regex is working in real time

there are also a million cheatsheets... like this one https://www.rexegg.com/regex-quickstart.html

not to repeat myself, but.......... definitely change the .* to something that matches digits :-)

cheers & let me know if I can help.