Closed q2apro closed 9 years ago
I found something about Collation and some explanations here / mysql bug from 2010. Maybe this is the problem?!
The solution for the guy over there was: "I solved my problem with creating a lowercase column where I can do a utf8_bin collation search."
PS: I tried to change the collation of table qa_words to utf8_lithuanian_ci (where both e letters exist), however, q2a just throws the error message "A Question2Answer database query failed when generating this page. A full description of the failure is available in the web server's error log file."
I found a solution, using the query:
SELECT wordid FROM qa_words WHERE word= _utf8 "teisė" COLLATE utf8_bin LIMIT 1
which gives me the correct wordid 11294.
I tried to implement this in qa-db-selects.php but it seems not to work (I still get the wrong id in the favorite form / favorite button):
Instead:
$selectspec['source'].=' AND entityid=(SELECT wordid FROM ^words WHERE word=$ LIMIT 1)';
I used:
$selectspec['source'].=' AND entityid=(SELECT wordid FROM ^words WHERE word=_utf8 $ LIMIT 1)';
?
The simple solution is to remove one of those rows, since only one should be there if both of them match the query. I do not know how they both got in there since Q2A is careful in function qa_db_word_mapto_ids_add() to prevent this happening. It could be due to some kind of collation inconsistency in MySQL.
Thanks for your answer. But is this really a consistent "simple" solution?
As far as I understand, qa_words holds all words from posted content, titles, tags. As soon as a new post arrives, I guess the qa_words are recalculated. Plus the new words are added to the qa_words table. So the false "teisę" would appear again.
I am still wondering why my manual mysql query works, but implemented in q2a does not?
I tried to remove the word from the database, it does not work:
SQL query:
DELETE FROM `qa_words` WHERE `qa_words`.`wordid` =1900 LIMIT 1
MySQL said: Documentation
#1451 - Cannot delete or update a parent row: a foreign key constraint fails (`db348277_xy`.`qa_contentwords`, CONSTRAINT `qa_contentwords_ibfk_2` FOREIGN KEY (`wordid`) REFERENCES `qa_words` (`wordid`))
OK, so you have to update the rows in qa_contentwords to use the other wordid.
I just deleted all wordids=1900 from qa_contentwords but still cannot delete the wordid=1900 from qa_words.
Nevertheless, is there no other solution? Why is my mysql query working and the q2a one does not?
@gidgreen Can you explain what you mean by your comment above ("only one should be there if both of them match the query")? Are you saying that teisė
and teisę
should not both be in the qa_words
table?
I just tried tagging questions with each of the words, and both end up in there. And a query such as SELECT * FROM qa_words WHERE word='teisé'
(with a third different e) shows both.
Yes, exactly, both words should not be in qa_words
if they both match a query like SELECT * FROM qa_words WHERE word='teise'
- the logic for adding words to qa_words
is in function qa_db_word_mapto_ids_add()
and it is meant to prevent adding words which are considered equal under the collation for that column.
I still have no solution. @gidgreen To check if I understand you right. If I remove teisę
from the database, then teisė
will be returned?
But as I wrote already, I cannot hunt and remove each word that is coming in. Especially because it is not my mother tongue and I am not the administrator.
I think this is quite a big, still undiscovered problem of the q2a core (v1.6.3). Each favoriting could lead to wrong words.
For the newsletter plugin that I am developing currently, it seems that I have to create an extra table for all favorited tags of each users. Otherwise the wrong wordid is picked up and the wrong word showing up frontend.
And again, I don't understand why the manual mysql query SELECT wordid FROM qa_words WHERE word= _utf8 "teisė" COLLATE utf8_bin LIMIT 1
gives me the correct result, but not when I implement the same addition in the q2a core with $selectspec['source'].=' AND entityid=(SELECT wordid FROM ^words WHERE word=_utf8 $ LIMIT 1)';
.
Really confusing to me. Greetings, Kai
PS: I deleted the false wordid of teisę
from qa_contentwords, then qa_titlewords, then I deleted the word itself from qa_words. Favoriting the tag teisė
now, shows me teise
as favorited. Sorry to say that, but it feels like something is really wrong here, plus I cannot rely on the core favorite tags system :(
If you're seeing this difference why not just add COLLATE utf8_bin
to the query in the Q2A core?
I tried $selectspec['source'].=' AND entityid=(SELECT wordid FROM ^words WHERE word=_utf8 $ COLLATE utf8_bin LIMIT 1)';
as well, it was still the same problem. That's why I am also confused :-)
For my tags newsletter plugin I ended up creating a separate table qa_favtags that holds userid and tagwords. Now I am using this to store the correct UTF8-tagwords.
But well, as I mentioned, the favoriting bug still exists in core...
Just to note, I fixed the specific favoriting problem by adding BINARY
to the column selected. IIRC it was the line inside qa_db_user_favorite_tags_selectspec
:
'columns' => array('word', 'tagcount'),
// replaced with...
'columns' => array('BINARY word', 'tagcount'),
That's not been committed anywhere. I will be checking into the issue Gideon mentioned above so theoretically they shouldn't both get into the words table.
@Scott: I have tried this, but when I favorite the one, the other one is favorited as well (frontend).
Backend/Frontend: The same wordid is still outputted for the favorite button.
If the intention was to make sure only unique words were inserted in ^words table. Then why isn't the column unique? That would speed up many queries.
You mentioned the idea was to make sure papä
and papá
are stored only once. Wouldn't it be better to make sure the table itself won't allow those characters? (I'm not sure if this is possible by playing with charset and collation... most likely not but I need a second opinion here).
Changing to solving this by PHP, I still need to confirm exactly what should happen when saving papä
and papá
. Would it be correct to say that these words should be processed with qa_string_remove_accents
function, removed duplicates and then checked its ("its" instead of "their" because they will turn into papa
) presence in the DB?
Now if we do this, then we will be sure the word papá
won't be on the ^words table. But this makes me wonder:
papä
and papá
)?http://q2a.com/tag/papa
or to their respective url tag? In case of the last one then the processing should also normalize the tag in the url.papa
?Absolutely unrelated: I've seen users (I guess they were chinese), that have tags in their respective languages. This would absolutely change the way they tag and will only allow normalized tags. This means new users will only be able to user normalized tags and old users will have to radically change their tagging system. IMO this is a huge no-go, at least for 1.7, if this was 2.0 then it could be acceptable. I'd say let's stick to binary searching over that table and making the column unique (unless there is a reason not to do so which I'm not getting).
Not sure if I shed some light or complicated things even more :)
Just for your information, listing questions with the tag teisė
was also not working correctly. I have 204 questions tagged exactly with teisė
, but q2a lists only 20 questions.
I first tested the mysql query:
SELECT postid FROM qa_posttags WHERE wordid=(SELECT wordid FROM qa_words WHERE word="teisė" AND word="teisė" COLLATE utf8_bin LIMIT 1)
and it shows correctly 202 posts in phpmyadmin. I found the query in qa-db-selects.php with function qa_db_tag_recent_qs_selectspec(...)
I could solve this issue finally by using Scott's suggestion above, and changing qa-db-selects.php in function qa_db_tag_word_selectspec($tag)
the line:
'source' => '^words WHERE word=$',
to:
'source' => '^words WHERE BINARY word=$',
@svivian Thanks mate!
While developping another plugin that deals with tags I stumbled over the following bug:
When a user favorites the tag "teisė" or the tag "teisę" both are favorited as entity "teisę".
Already in the frontend favorite form you can see that both tags have the same T id which is for the example favoriteT
1900
_0. However, teisė should have 11294 according to the database.For seeing the bug: http://www.klaustukai.lt/tag/teis%C4%97 http://www.klaustukai.lt/tag/teis%C4%99
The problem lies in
qa-db-selects.php
with the mysql query:Querying my database with
SELECT wordid FROM qa_words WHERE word="teisė" LIMIT 1
incorrectly returns wordid=1900 instead of wordid=11294. Obviously the special character ė is interpreted as ę.Any way to fix this? Kai