msaari / relevanssi

Relevanssi, a WordPress plugin to improve the search
GNU General Public License v3.0
49 stars 21 forks source link

is_numeric validation not leading to expected behaviour #15

Closed tedyw closed 5 years ago

tedyw commented 5 years ago

Recent changes in commit https://github.com/msaari/relevanssi/commit/76b87b7cfc66e7b2f0a713e3983a4bd748eb421e potentially generates faulty SQL queries. Check changes in line 324 to 327.

I've discovered that if a previously deleted term contains a number in its slug, then the validation in line 321 still passes and the string slug is used as a numeric id, leading to a SQL statement where the term is parsed as a non-existing column.

What if is_numeric is to be replaced with ctype_digit instead? This should correctly determine if the slug only contains numeric characters.

msaari commented 5 years ago

Since the numbers are all positive integers, isn't is_numeric() practically the same as ctype_digit()? Can you please elaborate on an example here?

tedyw commented 5 years ago

Consider the string 1e0, if this would be the slug on a term then passing the slug into the is_numeric function would result in a valid numeric because the letter e is considered as the exponential symbol, while a test with ctype_digit would fail as the string contains the alphabetic letter e. In earlier versions of PHP strings in hexadecimal, e.g. 0x000, would also pass when testing with is_numeric. More of this in the docs https://www.php.net/manual/en/function.is-numeric.php.

While the provided example would certainly lead to a non-expected result, this wasn't exactly the case for me as the slug that lead to producing an error was a normal slug with a number suffix. Maybe something else is going on.

In the full error message below, this line tells us that the slug somehow is considered as an ID: AND t.term_id IN (fox-umbrellas-2)

It's a deleted term that is still being recognized by search engine bots when they try to crawl a search result URL.

Full error message: 2019/07/23 20:50:08 [error] 7#7: *16 FastCGI sent in stderr: "PHP message: WordPress database error Unknown column 'fox' in 'where clause' for query SELECT tt.term_taxonomy_id FROM wp_term_taxonomy AS tt LEFT JOIN wp_terms AS t ON (tt.term_id=t.term_id) WHERE tt.taxonomy = 'product_brand' AND t.term_id IN (fox-umbrellas-2) made by require('wp-blog-header.php'), wp, WP->main, WP->query_posts, WP_Query->query, WP_Query->get_posts, apply_filters_ref_array('the_posts'), WP_Hook->apply_filters, relevanssi_query, relevanssi_do_query, relevanssi_search, relevanssi_process_query_args, relevanssi_process_tax_query, relevanssi_process_tax_query_row, relevanssi_term_tax_id_from_row, wpdb->get_col, wpdb->print_error" while reading response header from upstream, client: 66.249.75.178, server: _, request: "GET /shop?s=Denim&post_type=product&filtering=1&filter_product_brand=1379,1305,2254,2759,canali,100-hands,borsalino,william-lockie,berwich,gloverall,tagliatore,maurizio-baldassari,fox-umbrellas-2,castaner,lbm,inabo,heschung,san-siro,wigens,ring-jacket,eduard-dressler,cqp,stenstroms,the-bespoke-dudes,fumagalli,kirk-originals,zaremba,albert-thurston,silvio-fiorello,tramarossa,bagutta,caruso,ortigni,luciano-barbera,bow-tie&v=f003c44deab6&orderby=price HTTP/1.1", upstream: "fastcgi://172.31.132.82:9000", host: "baltzar.com"

msaari commented 5 years ago

Looks like the is_numeric() test isn't a problem, because even if I use a slug that returns true for is_numeric() like 1e0, $term = get_term_by( $field_name, $name, $taxonomy ); will return the term so the is_numeric() test is not even done.

Your query looks like it might have a mixture of term IDs (1379, 1305, 2254, 2579) and slugs (the rest). That seems like something that might be a problem. Can it be your filtering feature is constructing a tax query that specifies term ID as the field, even though you are using also slugs? If the tax_query says the field is "term_ID", Relevanssi won't really question that, Relevanssi will just filter the IDs with is_numeric() – though that should filter out fox-umbrellas-2 as well.

Anyway, I can replace is_numeric() with ctype_digit() in both cases where Relevanssi uses it for numeric taxonomy terms, but I don't think it is going to fix your problem here.

msaari commented 5 years ago

Funny fact: ctype_digit(3); returns false. So no, it's actually not as simple as replacing is_numeric() with ctype_digit()...

tedyw commented 5 years ago

It seems that replacing is_numeric with ctype_digit brings other issues such as numbers being recognized as ASCII characters (0-255) like you observed. As you said the problem might be invoked by mixing slugs and ids. I'll have to look into it. Thanks for the hint.