splitbrain / dokuwiki-plugin-data

Add and query structured data in your DokuWiki
http://www.dokuwiki.org/plugin:data
GNU General Public License v2.0
50 stars 34 forks source link

Double pid for same dataentry #187

Open tmo26 opened 8 years ago

tmo26 commented 8 years ago

Release 2015-08-10a "Detritus" Data plugin: 2015-09-14

Problem

Root cause: No clear indication of the root cause for this behaviour. Guessing: Pagemoves by pagemove plugin and/or deletion of dataentry pages corrupts the sqlite database.

Further investigation via the SQlite interface showed, that there is a difference in the pagename of the doubles: They have an additional '0A' (=CHAR(10)=LF=\n) at the end of the pagename.

Fixing the sqlite db The problem can be solved by deleting the doubles from the 'data' and 'pages' tables.

DELETE FROM data WHERE pid IN (SELECT pid FROM pages WHERE page LIKE '%' || X'0A' || '%');
DELETE FROM pages WHERE page LIKE '%' || X'0A' || '%';

-> Datatables show only a single pid for a dataentry, BUT -> Problem reoccurs after a while :(

Questions

  1. Where does the double dataentry with the additional '0A' at the end of the pagename come from? Can it be avoided that this double is created?
  2. If the double can't be avoided: Can the above sqlite statements be included in the data_clean procedure? This would at least make the fixing of the sqlite db much easier.

Thanks in advance!

splitbrain commented 8 years ago

This is very weird, especially since you seem to be the only one having this problem. A couple of questions:

virk commented 8 years ago

Perhaps tmo26 makes use of the "include-plugin" which conditionally causes data appear twice; see #168.

tmo26 commented 8 years ago

@virk #168 seems different. In our wiki, we use the include plugin, but only to include other content on pages with dataentries. We do not include dataentries in other pages.

tmo26 commented 8 years ago

@splitbrain

Last column shows

Further investigation via the SQlite interface showed, that there is a difference in the pagename of the doubles: They have an additional '0A' (=CHAR(10)=LF=\n) at the end of the pagename.

I can find those via the SQlite interface:

SELECT * FROM data WHERE pid IN (SELECT pid FROM pages WHERE page LIKE '%' || X'0A' || '%');
SELECT * FROM pages WHERE page LIKE '%' || X'0A' || '%';

I'm afraid, I can't change entry.php myself (sitting only at the front end, with no console access), but I know someone who can ;-) Will take a while though.

Additional thoughts: Can this be related to #183 and #184 ?

tmo26 commented 8 years ago

Posted 3 minutes ago that it is working.... and suddenly, the problem reoccured: Currently 4 Doubles under TP-Link Brand, see https://wiki.openwrt.org/toh/start

splitbrain commented 8 years ago

Okay, I saw the double entries. What's interesting is that the pages containing the data entries did not get edited (at least the last modified date points back a few days). Very weird indeed.

tmo26 commented 8 years ago

Example shows identical pagename but different pids. 1232 is the faulty one with the 0A at the end of the pagename.

SELECT * FROM pages WHERE title LIKE 'Techdata: TP-Link Archer C7R WDR7500 v4.0'

pid page    title   class   lastmod
903 toh:hwdata:tp-link:tp-link_archer_c7r_wdr7500_v4.0  Techdata: TP-Link Archer C7R WDR7500 v4.0   techdata    1448358257
1232 toh:hwdata:tp-link:tp-link_archer_c7r_wdr7500_v4.0     Techdata: TP-Link Archer C7R WDR7500 v4.0   techdata    1448358257
tmo26 commented 8 years ago

Number of affected dataentries is growing: Now we have 8.

splitbrain commented 8 years ago

It would be helpful to see the full list of installed plugins (not only syntax plugins). I assume the DokuWiki core has not been modified?

tmo26 commented 8 years ago

Dokuwiki core mods: None that I know of.

Full plugin list (from the popularity feedback): plugin vshare plugin authmysql plugin medialist plugin comment plugin cellbg plugin upgrade plugin prettytables plugin redirect plugin datatemplate plugin color plugin data plugin plugin plugin searchstats plugin extension plugin clearhistory plugin logstats plugin hidepages plugin addnewpage plugin searchindex plugin multilingual plugin divalign2 plugin sqlite plugin pageindex plugin info plugin discussion plugin pagemove plugin captcha plugin bureaucracy plugin ditaa plugin folded plugin acl plugin tag plugin include plugin cleanup plugin config plugin pagelist plugin revert plugin statdisplay plugin authorstats plugin authad plugin ipban plugin authplain plugin badbehaviour plugin columns plugin multiorphan plugin move plugin custombuttons plugin sortablejs plugin code plugin usermanager plugin batchedit plugin wrap plugin tabbox plugin authpgsql plugin safefnrecode plugin orphanswanted plugin popularity plugin pagequery plugin authldap plugin styling

splitbrain commented 8 years ago

Also what PHP version are you running and is there something special about it? Eg. any opcaches, optimizers or similar things? I'm asking because to me it looks like the general whitespace cleaning is somehow broken in your wiki and I wonder why.

tmo26 commented 8 years ago

Not knowing if there's anything critical in the popularity feedback, I post only excerpts. I can certainly post the complete list, if appropriate.

pcre_version 8.31 2012-07-06 os Linux webserver nginx/1.4.6 php_version 5.5.9-1ubuntu4.14 php_sapi fpm-fcgi

php_extension Zend OPcache

splitbrain commented 8 years ago

Going through the source code, it seems to me that the problem occurs when entries are resaved on metadata rendering. It's the only way data can get saved with out the page itself not being modified. It would also explain why it happens in waves (roughly when the cachetime expires).

I'm still not clear why this happens for you (it seems not to happen for anyone else). It would be interesting to see the cached instructions of an affected file. If my suggested hack of adding a trim() helps, it would be a good point to add a debug log with a stack trace there.

Please let me know when the problem reoccurs or if it doesn't within the usual time.

tmo26 commented 8 years ago

Thanks so far for your quick action! Last cleanup of yesterday evenings errors (4 Doubles at 22:11 -> 16 Doubles at 02:50) has been done this morning 08:38. Now, 11h later, everything is looking good, no errors on https://wiki.openwrt.org/toh/start. However, there have been no dataentry edits the whole day.

I will keep an eye on this issue and come back to you as soon as I have new information.

tmo26 commented 8 years ago

Status after 4 days, 120 batch edits, 30 normal page edits and 40 edits via data editor:

tmo26 commented 8 years ago

Status after 10 days: Not a single double any more. -> Consider this issue as closed.

Thanks again for your help!

tmo26 commented 8 years ago

Closed too early: Today I got some doubles again... a lot of them :( Do you need the sqlite database again, or anything else?

tmo26 commented 8 years ago

Observation of today:

Slow filling of DB after searchindex rebuild has long finished: Time Number of devices in the DB 16:26 506 16:27 507 16:29 518 16:40 542 16:48 598 16:50 613 16:52 614 16:53 615 16:55 615

Further up you asked about optimizers and caches. I revisited the popularity feedback and found now also APC: php_extension apc php_extension apcu php_extension Zend OPcache

Could one of those be responsible for the doubles?

ReneDW commented 8 years ago

I have the same trouble with doubles. Currently only at one directory with about 20 pages in it. Tried many things like : Cleaned up sqlite entry's. Uninstalled / reinstalled plugins : sqlite, data, bureaucracy. Rebuild index... Still persists.

I use : Release 2015-08-10a "Detritus" + default Dokuwiki theme + latest plugins, just a few and most of Splitbrain cs Php 5.6 I have made no modifications.

I used DW for more than 2 years now, it worked fine. I was planning to upgrade dokuwiki from a rather personal wiki to the company's wiki. (Finaly got the management behind it) This is very disappointing.

splitbrain commented 8 years ago

@tmo26 we had reports of problems with APC in the past (unrelated to the data plugin). I think it would be worth a try running without them for a while, to see if that changes something. re. 502: we do not send this status anywhere from code, I guess it was shown by the webserver because of a failing connection to the PHP process (fpm or fcgi).

@ReneDW when you say it worked fine for years, you mean that the doubling only started to happen recently while it did not happen in the two years before?

ReneDW commented 8 years ago

@splitbrain i used the data plugin cs very sparsely over the last 2 years, most tables where the easy ones. .. (didn't fully understand it) . Now I am using them more and more.

Last night i cleaned up all the doubles , no other person used it after that. When I came in this morning there where no doubles !

I went to the data table witch had the problem before and there was no double. I opened the first record ,with the nice editor , changed nothing and pressed save again.

A double entry was created in my table ! Just for this one record (select * from data where key = "info.isbn nummer "):

6350 477 info.isbn nummer 109050441386 
6916 296 info.isbn nummer 109050441386 

When I opened the record a third time and saved it again NO new entry was created.

I have 30 entries in this name space.

I tried one of my other lager tables but no double entry was made !.

This is my template :

---- dataentry infotheek ----
Soort                     : @@Soort@@ # 
INFO.isbn nummer          : @@ISBN nummer@@ # 
INFO.actueel_jn           :  #   => type alias
INFO.titel                :  # 
INFO.type_info_type       :  #  => type alias
INFO.metadata             :  # 
INFO.uitgever             :  # 
INFO.auteur               :  # 
INFO.jaar uitgifte        :  # 
INFO.druk                 :  # 
INFO.samenvatting         :  # 
INFO.opmerking            :  # 
INFO.voorbeeld_info_thumb : @@Bestandsnaam@@ #   => type alias : wiki
INFO.link_info_loc        : @@Bestandsnaam@@ # => type alias : wiki
----
this is my query :
==== Infotheek ====
<WRAP font80>
----datatable----
cols: %pageid%,Soort,INFO.isbn nummer,INFO.actueel_jn,INFO.titel,INFO.type_info_type,INFO.metadata,INFO.uitgever,INFO.auteur,INFO.jaar uitgifte,INFO.druk,INFO.samenvatting,INFO.opmerking,INFO.voorbeeld_info_thumb,INFO.link_info_loc

filter : Soort=Info
sort : INFO.isbn nummer
dynfilters : 1
----
</WRAP>
splitbrain commented 8 years ago

@ReneDW your result set shows two page ids (477 and 296) can you check the entries in the page table for those with SELECT *, LENGTH(page) FROM pages WHERE pid IN (477,296)?

ReneDW commented 8 years ago

pid

page

title

class

lastmod

LENGTH(page)

296 wiki:intranet:infotheek:109050441386 wiki:intranet:infotheek:109050441386 infotheek 1465640988 36 477 wiki:Intranet:infotheek:109050441386 wiki:Intranet:infotheek:109050441386 infotheek 1465289612 36

splitbrain commented 8 years ago

@ReneDW woah. something is completely wrong there. The pageID has an uppercase letter. This should never happen unless you modified the DokuWiki sources somehow.

ReneDW commented 8 years ago

Me ? Changing source code ? Well I am 57 years old with 35 years of computing experience. I learned NOT to change anything some one else made. I learned it many times the hard way .... ;-)

splitbrain commented 8 years ago

@ReneDW okay, then I'm at a loss here. I have no idea why and how this happens. But it seems to be a different problem from what @tmo26 describes.

Just for completeness can you list your plugins and used PHP extensions?

ReneDW commented 8 years ago

anon_id f5ebcdb1f6b3dee929fcb65b10032d45 version Release 2015-08-10a "Detritus" popversion 2015-07-15 language nl now 1465643028 popauto 0 conf_useacl 1 conf_authtype authad conf_template dokuwiki page_count 614 page_size 1049926 page_biggest 78255 page_smallest 8 page_nscount 83 page_nsnest 4 page_avg 1709.9771986971 page_oldest 1389190549 media_count 1572 media_size 1208595828 media_biggest 154076264 media_smallest 40 media_nscount 38 media_nsnest 4 media_avg 768826.86259542 cache_count 8241 cache_size 332927759 cache_biggest 6732865 cache_smallest 0 cache_avg 40398.951462201 index_count 84 index_size 805244 index_biggest 286021 index_smallest 6 index_avg 9586.2380952381 meta_count 2007 meta_size 4559905 meta_biggest 2488905 meta_smallest 0 meta_avg 2272.0004982561 attic_count 4421 attic_size 7260451 attic_biggest 28120 attic_smallest 25 attic_avg 1642.2644198145 attic_oldest 1389191022 user_count 99 edits_per_day 16.481462672773 plugin acl plugin authad plugin authplain plugin bookcreator plugin bureaucracy plugin changes plugin columns plugin config plugin csstimeline plugin data plugin davcal plugin directorylist plugin discussion plugin do plugin docsearch plugin dw2pdf plugin extension plugin filelist plugin filterrss plugin fontcolor plugin gallery plugin graphviz plugin hidden plugin highlight plugin iframe plugin include plugin indexmenu plugin info plugin logstats plugin move plugin nosidebar plugin orgchart plugin orphanswanted plugin pagelist plugin pagenav plugin pagetemplater plugin plantuml plugin plugin plugin popularity plugin redirect plugin revert plugin safefnrecode plugin schedule plugin searchform plugin searchindex plugin searchtablejs plugin sortablejs plugin sqlite plugin statdisplay plugin strreplace plugin styling plugin tabinclude plugin tablecalc plugin tablewidth plugin twitter plugin upgrade plugin usermanager plugin wrap plugin _notes pcre_version 8.34 2013-12-15 pcre_backtrack 20971520 pcre_recursion 100000 os WINNT webserver Microsoft-IIS/8.5 php_version 5.6.0 php_sapi cgi-fcgi php_memory 134217728 php_exectime 300 php_extension Core php_extension bcmath php_extension calendar php_extension ctype php_extension date php_extension ereg php_extension filter php_extension ftp php_extension hash php_extension iconv php_extension json php_extension mcrypt php_extension SPL php_extension odbc php_extension pcre php_extension Reflection php_extension session php_extension standard php_extension mysqlnd php_extension tokenizer php_extension zip php_extension zlib php_extension libxml php_extension dom php_extension PDO php_extension openssl php_extension SimpleXML php_extension wddx php_extension xml php_extension xmlreader php_extension xmlwriter php_extension cgi-fcgi php_extension mysql php_extension mysqli php_extension mbstring php_extension gd php_extension gettext php_extension curl php_extension exif php_extension xmlrpc php_extension Phar php_extension soap php_extension pdo_mysql php_extension pdo_sqlite php_extension imap php_extension tidy php_extension pdo_sqlsrv php_extension sqlsrv php_extension ldap php_extension mhash

ReneDW commented 8 years ago

@splitbrain What do you mean by : The pageID has an uppercase letter ?

ReneDW commented 8 years ago

@splitbrain : RSS of change of the affected record . Where did the # go ?

1465639552  huidige
Regel 1:    Regel 1:
    ---- dataentry infotheek ----       ---- dataentry infotheek ----
-   Soort                     : Info #      +   Soort                     : Info 
-   INFO.isbn nummer          : 109050441386 #      +   INFO.isbn nummer          : 109050441386 
-   INFO.actueel_jn           :  #      +   INFO.actueel_jn           :  
-   INFO.titel                : GIW / ISSO Publicatie 2007 #    +   INFO.titel                : GIW / ISSO Publicatie 2007 
-   INFO.type_info_type       : info #      +   INFO.type_info_type       : info 
-   INFO.metadata             : installatie #   +   INFO.metadata             : installatie 
-   INFO.uitgever             : ISSO #      +   INFO.uitgever             : ISSO 
-   INFO.auteur               :  #      +   INFO.auteur               :  
-   INFO.jaar uitgifte        : 2007 #      +   INFO.jaar uitgifte        : 2007 
-   INFO.druk                 :  #      +   INFO.druk                 :  
-   INFO.samenvatting         : GIW / ISSO Publicatie 2007 installatieeisen nieuwbouw eengezinswoningen en appartementen #      +   INFO.samenvatting         : GIW / ISSO Publicatie 2007 installatieeisen nieuwbouw eengezinswoningen en appartementen 
-   INFO.opmerking            :  #      +   INFO.opmerking            :  
-   INFO.voorbeeld_info_thumb : isso_publicatie_2007 #      +   INFO.voorbeeld_info_thumb : isso_publicatie_2007 
-   INFO.link_info_loc        : isso_publicatie_2007 #  +   INFO.link_info_loc        : isso_publicatie_2007
    ----        ----
    +   
ReneDW commented 8 years ago

I should be ashamed of myself.... (the hard way...duh)

In the beginning (about 3 years ago) I created a directory manually, and put in some test files. I use windows so a capital in the name is easily used . I now put my data in an other directory, created by Dokuwiki. That is why there are no troubles.

Well this is solved .... , I` am very sorry to have used your valuable time.

tmo26 commented 8 years ago

@splitbrain Problem re-occured today in a brandnew wiki: https://wiki.lede-project.org/toh/start (currently OK after cleanup via sqlite-interface)

@tmo26 we had reports of problems with APC in the past (unrelated to the data plugin). I think it would be worth a try running without them for a while, to see if that changes something.

The new wiki does not have APC, however, Zend OPcache is enabled.

can you try putting a trim($id) at the very first line of the _saveData() function and see if it fixes the problem? https://github.com/splitbrain/dokuwiki-plugin-data/blob/master/syntax/entry.php#L207

Done. I do now have console access. If any other modifications would be helpful, let me know.

Plugins + php-extensions of the new wiki:

plugin  prettytables
plugin  move
plugin  data
plugin  revert
plugin  extension
plugin  authplain
plugin  upgrade
plugin  wrap
plugin  todo
plugin  include
plugin  badbehaviour
plugin  hidepages
plugin  tabbox
plugin  pagequery
plugin  config
plugin  acl
plugin  bureaucracy
plugin  clearhistory
plugin  sqlite
plugin  feed
plugin  ditaa
plugin  orphanswanted
plugin  vshare
plugin  columns
plugin  info
plugin  tag
plugin  gh
plugin  translation
plugin  datatemplate
plugin  color
plugin  styling
plugin  safefnrecode
plugin  searchindex
plugin  oauth
plugin  authorstats
plugin  popularity
plugin  comment
plugin  loadskin
plugin  pagelist
plugin  usermanager
plugin  captcha
plugin  outliner
plugin  discussion
plugin  smtp
plugin  cellbg
pcre_version    8.35 2014-04-04
pcre_backtrack  20971520
pcre_recursion  100000
os  Linux
webserver   nginx/1.10.1
php_version 7.0.11-1~dotdeb+8.1
php_sapi    fpm-fcgi
php_memory  134217728
php_exectime    300
php_extension   Core
php_extension   date
php_extension   libxml
php_extension   openssl
php_extension   pcre
php_extension   zlib
php_extension   filter
php_extension   hash
php_extension   Reflection
php_extension   SPL
php_extension   session
php_extension   standard
php_extension   cgi-fcgi
php_extension   PDO
php_extension   xml
php_extension   calendar
php_extension   ctype
php_extension   dom
php_extension   exif
php_extension   fileinfo
php_extension   ftp
php_extension   gd
php_extension   gettext
php_extension   iconv
php_extension   json
php_extension   pdo_sqlite
php_extension   Phar
php_extension   posix
php_extension   readline
php_extension   shmop
php_extension   SimpleXML
php_extension   sockets
php_extension   sqlite3
php_extension   sysvmsg
php_extension   sysvsem
php_extension   sysvshm
php_extension   tokenizer
php_extension   wddx
php_extension   xmlreader
php_extension   xmlwriter
php_extension   xsl
php_extension   Zend OPcache

It would be interesting to see the cached instructions of an affected file. If my suggested hack of adding a trim() helps, it would be a good point to add a debug log with a stack trace there.

I would need your guidance to get the debug log + stack trace.

tmo26 commented 8 years ago

@splitbrain The trim($id) you proposed has no effect: Double dataentries are coming back after a while. Any other ideas? Anything else I can do?

tmo26 commented 8 years ago
opcache.enable=0
opcache.enable_cli=0

I'll report back if the problem persists with disabled Zend OPcache.

tmo26 commented 7 years ago

I found the culprit: It is the pagequery plugin that creates the double dataentries.

Ticket opened at the pagequery issues page (see above).

tmo26 commented 7 years ago

@splitbrain

Going through the source code, it seems to me that the problem occurs when entries are resaved on metadata rendering. It's the only way data can get saved with out the page itself not being modified.

Having now reproduced the conditions that make the issue occur in my demowiki, I get additional error messages not seen before, that IMHO support your above assumption:

Error: Page toh:hwdata:3com:3com_3ctest_delete needs to be rewritten because of page renames but is not writable.

Error: Page toh:hwdata:3com:3com_3ctest_3c35_delete needs to be rewritten because of page renames but is not writable.

I have checked that the access rights for the above pages are set correctly, therefore I can not explain why they should be non writable.

Anyway, I posted this just to put in one more piece to the puzzle. As written above, ticket against pagequery is already opened.