Closed gtoffoli closed 13 years ago
Oops! I realized that special characters aren't automatically escaped in this HTML page. The first solution above consists in putting in the regexp, after the opening A tag, the code for "any whitespace character", followed by dot, star, question mark (the syntax matching "as few repetitions as possible" of "any character except newline").
That's pretty interesting that it would do that. I changed it to the regex pattern you suggested for now. Now that I have some free time I think I'm going to sit down and rewrite the whole thing, so we'll see how it turns out!
Hi! I noticed that when multiple links are present in a line, only the last one is matched. I found that linkregex = re.compile('<a\s.?href=[\'"](.?)[\'"].?>') often is ok. But perhaps linkregex = re.compile('<a\s(?:.?\s)?href=[\'"](.?)[\'"].*?>') is better. Regards, Giovanni