nischayn22 / PageQuality

A MediaWiki extension to monitor and improve Page Quality
MIT License
0 stars 1 forks source link

blocked_expressions highlighting sometimes messed up #17

Closed drorsnir closed 2 years ago

drorsnir commented 2 years ago

See the following two screenshots, from this page: https://www.kolzchut.org.il/he/הפסקת_הריון

It starts out OK, but then we suddently have whole examples in bold. It looks like maybe you add the <b> tags and then cut the string, causing the tags to flow over.

Screenshot 2021-12-13 at 17-48-41 הפסקת הריון Screenshot 2021-12-13 at 17-48-24 הפסקת הריון

nischayn22 commented 2 years ago

I tried to see this live in action, but on this view it doesn't show me anything: https://www.kolzchut.org.il/he/Special:PageQuality/report?page_id=4497

Could you please share the link where I can find the HTML for the results?

drorsnir commented 2 years ago

So, I can recreate it using this revision: www.kolzchut.org.il/w/he/index.php?title=הפסקת_היריון&action=edit&oldid=454915

Now, it's actually very simple (but took me over an hour to debug):

  1. Let's use the word "הריון" as the blocked expression
  2. The DB field is for examples 100 characters long
  3. The offending sentence is "להפסיק את ההריון. סעיף 316 לחוק העונשין מפרט את המקרים שבהם יש לוועדה סמכות לאשר הפסקת הריון"
  4. The sentence you take is 92 characters long, and then you add <b></b> - another 8 characters. This should work perfectly, only the expression shows twice - 8 characters too many.
  5. so you end up with this:

    • להפסיק את ה<b>הריון</b>. סעיף 316 לחוק העונשין מפרט את המקרים שבהם יש לוועדה סמכות לאשר הפסקת <b>הריון</b>
    • which gets cut down to להפסיק את ה<b>הריון</b>. סעיף 316 לחוק העונשין מפרט את המקרים שבהם יש לוועדה סמכות לאשר הפסקת <b>הר
      1. And we have a runaway <b> tag...

    The simplest solution is to use substr_replace() instead of str_replace(), and replace only the 1st occurrence of the expression.

    (And while you're at it, could you remove the trim( $blocked_expression )? Many times we need to add a space before and/or after the expression, so that it won't be considered a match in the middle of another word.)

nischayn22 commented 2 years ago

I don't fully understand this. But do you think just adding str_replace with count 1 should solve the problem?

Here's the function definition:

str_replace( array|string $search, array|string $replace, string|array $subject, int &$count = null ): string|array

drorsnir commented 2 years ago

Count doesn't do what you think it does: If passed, this will be set to the number of replacements performed

So it doesn't limit the number of replacements. Otherwise, this would be perfect. Use whatever you need to make sure only a single replacement is performed.

nischayn22 commented 2 years ago

Fixed in https://github.com/nischayn22/PageQuality/commit/00fed7378fcebf97572dc292612838c39513315c

drorsnir commented 2 years ago

Seems to work well :-)