walshbr / woolf

0 stars 0 forks source link

Weird Broken Matches #1

Closed walshbr closed 9 years ago

walshbr commented 9 years ago

Noticed that some matches were breaking off in the middle of words. Here is an example from the first chapter of Night and Day, which I have loaded as text.txt:

<_sre.SRE_Match object; span=(5900, 6041), match='"i had just written to say how i envied her! i wa>,<_sre.SRE_Match object; span=(6051, 6203), match='" and snuff the candles. have they all disappeare>,

Might be something in the RE searching that is making this happen. Think this is probably also related to the unusually high percentage of quoted text for that text.

Passage this comes from:

"Oh, Mr. Fortescue," exclaimed Mrs. Hilbery, as he finished, "I had just written to say how I envied her! I was thinking of the big gardens and the dear old ladies in mittens, who read nothing but the "Spectator," and snuff the candles. Have they ALL disappeared? I told her she would find the nice things of London without the horrid streets that depress one so."

walshbr commented 9 years ago

Looks like it is occurring in the find_quoted_quotes function. There are also a fair number of matches (like these) that aren't well-formed, meaning that they start with opening quotations but don't end with them. The first two matches actually show the difference between one sandwiched by quotes and one that is not.

<_sre.SRE_Match object; span=(1325, 1369), match='"what an extremely nice house to come into!"'>, <_sre.SRE_Match object; span=(1712, 1768), match='"now, do you think we\'re enjoying ourselves enor>

walshbr commented 9 years ago

Never mind! It looks like they're just cut down in the terminal. SILLY ME.