As James explained in class on Mon, Jan 27, there are several ways to match a line in regex, which you might want to do, for example, to wrap <p> tags around it. One strategy that is beautifully minimal, but not self-evident, is (make sure “Dot matches all” is unchecked):
.*
This says “match zero or more consecutive characters”. The matching begins at the beginning of the entire text by default, and it continues until the first character that the dot doesn’t match, which is the first newline character. That’s your first match, and it gets replaced with the replacement expression (see below). The matching then starts again at the next character that dot does match, which is the first character on the second line, and matches until the next character that dot doesn’t match, which is to say that it matches all characters on the second line up to the final newline. This continues through all of the lines until the end. So you can match:
.*
and replace it with:
<p>\0</p>
The \0 automatically captures the thing matched, that is, the line, and this replacement pattern writes it back into the output, but adds tags around it.
The process above removes all newlines from the text, so you may wind up with one continuous line, with one <p> immediately after another. Never fear! If you then wrap your entire text in a root element (so that it will be well-formed; recall that one of the well-formedness rules is that there must be a single root element wrapped around everything), you can pretty-print it, and the pretty-printing will insert newlines between paragraphs and indent them to make everything more legible.
As James explained in class on Mon, Jan 27, there are several ways to match a line in regex, which you might want to do, for example, to wrap
<p>
tags around it. One strategy that is beautifully minimal, but not self-evident, is (make sure “Dot matches all” is unchecked):This says “match zero or more consecutive characters”. The matching begins at the beginning of the entire text by default, and it continues until the first character that the dot doesn’t match, which is the first newline character. That’s your first match, and it gets replaced with the replacement expression (see below). The matching then starts again at the next character that dot does match, which is the first character on the second line, and matches until the next character that dot doesn’t match, which is to say that it matches all characters on the second line up to the final newline. This continues through all of the lines until the end. So you can match:
and replace it with:
The
\0
automatically captures the thing matched, that is, the line, and this replacement pattern writes it back into the output, but adds tags around it.The process above removes all newlines from the text, so you may wind up with one continuous line, with one
<p>
immediately after another. Never fear! If you then wrap your entire text in a root element (so that it will be well-formed; recall that one of the well-formedness rules is that there must be a single root element wrapped around everything), you can pretty-print it, and the pretty-printing will insert newlines between paragraphs and indent them to make everything more legible.