mblakele / taskbot

Get things done with the MarkLogic Task Server
26 stars 4 forks source link

Issue with deletion of documents. #3

Open peetkes opened 8 years ago

peetkes commented 8 years ago

Hi,

I am testrunning the taskbot to be able to delete documents wit taskbot. I used the sample from the docs to create 1 million docs. I can see this filling up the tskqueue.

Insert documents:

xquery version "1.0-ml";

import module namespace tb = "ns://blakeley.com/taskbot" at "/taskbot.xqy";

(: This will create 1 million docs in block sizes of 500, so it will create 2000 tasks on the taskserver queue :) tb:list-segment-process( (: Total size of the job. :) 1 to 1000 * 1000, (: Size of each segment of work. :) 500, "/test/asset", (: This anonymous function will be called for each segment. :) function($list as item()+, $opts as map:map?) { (: Any chainsaw should have a safety. Check it here. :) tb:maybe-fatal(), for $i in $list return xdmp:document-insert( "/test/asset/"||$i, element asset { attribute id { 'asset'||$i }, element asset-org { 1 + xdmp:random(99) }, element asset-person { 1 + xdmp:random(999) }, (1 to xdmp:random(9)) ! element asset-ref { xdmp:random(1000) } }), xdmp:commit() }, (: options - not used in this example. :) map:new(map:entry('testing', '123...')), (: This is an update, so be sure to say so. :)

update

)

Delete documents: xquery version "1.0-ml"; import module namespace tb = "ns://blakeley.com/taskbot" at "/taskbot.xqy"; let $uris := cts:uris() tb:list-segment-process( (: Total size of the job. :) $uris, (: Size of each segment of work. :) 500, "Delete Documents", (: This anonymous function will be called for each segment. :) function($list as item()+, $opts as map:map?) { (: Any chainsaw should have a safety. Check it here. :) tb:maybe-fatal(), for $uri in $list return xdmp:document-delete($uri), xdmp:sleep(500), xdmp:commit() }, (: options - not used in this example. :) map:new(map:entry('testing', '123...')), (: This is an update, so be sure to say so. :) $tb:OPTIONS-UPDATE )

When I execute the deletion, I can see about half of the documents get deleted!! I also get errors stating 2015-12-23 12:55:16.183 Notice: TaskServer: XDMP-AS: (err:XPTY0004) $list as item()+ -- Invalid coercion: () as item()+ 2015-12-23 12:55:16.183 Notice: TaskServer: in /taskbot.xqy, at 294:37, 2015-12-23 12:55:16.183 Notice: TaskServer: in function() as item()*() [1.0-ml] 2015-12-23 12:55:16.183 Notice: TaskServer: in /taskbot.xqy [1.0-ml] 2015-12-23 12:55:16.279 Info: App-Services: ()

This would mean that during execution the list of uris will shrink according to the nr of documents that get deleted. Even surrounding the cts:uris() call with cts:eager will not resolve this.

I got this resolved with the help of Geert Josten who suggested the following: wrap the $uris (first parameter of the list-segment-process) in json:array-values(json:to-array($uris))

I also noted the following when I tried the $OPTIONS-SYNC-UPDATE parameter. When I use the $OPTIONS-SYNC-UPDATE parameter, the taskservers queue will never fill up like it does when I tried this with OPTIONS-UPDATE. It even does not make use of all available threads on the taskserver.

Why is that?

peetkes commented 8 years ago

First part is resolved by adding the option "eager" to the cts:uris. This changed between version 7 and 8. In 8 "lazy" is the default option if no other options are selected.

Second part is probably caused by the try catch surrounding the xdmp:spawn-function. Don't have an explanation for that yet, but when I remove the try catch, it will utilize all available threads.