icatproject / icat.server

The ICAT server offering both SOAP and "RESTlike" interfaces to a metadata catalog.
Other
1 stars 5 forks source link

duplicate results when using pagination with ordering by datafileCreateTime #190

Closed jodysalt closed 7 years ago

jodysalt commented 7 years ago

This is for icatapi-preprod.diamond.ac.uk (4.6.1)

When paginating the results I get the following:

(The only difference between these two queries is "ORDER BY datafile.datafileCreateTime desc")

query1: select distinct datafile from Datafile datafile , datafile.dataset as dataset , dataset.investigation as investigation , investigation.facility as facility where facility.id = 1 and investigation.name = 'NT11175' and investigation.id = 1010754770 and dataset.id = 1083896358 and UPPER(datafile.name) like concat('%', 'M8S8_2', '%') ORDER BY datafile.datafileCreateTime desc limit [offset], 50

query1: returned ids count: 1401 query1: unique returned ids count: 1347

query2: select distinct datafile from Datafile datafile , datafile.dataset as dataset , dataset.investigation as investigation , investigation.facility as facility where facility.id = 1 and investigation.name = 'NT11175' and investigation.id = 1010754770 and dataset.id = 1083896358 and UPPER(datafile.name) like concat('%', 'M8S8_2', '%') limit [offset], 50

query2: returned ids count: 1401 query2: unique returned ids count: 1401

ids with duplication: [1083931817, 1083931614, 1083930734, 1083930628, 1083930550, 1083931083, 1083931680, 1083931311, 1083931276, 1083930670, 1083931632, 1083931228, 1083930900, 1083930784, 1083931468, 1083930620, 1083931230, 1083930529, 1083930702, 1083930607, 1083930498, 1083931601, 1083931793, 1083930897, 1083931657, 1083930474, 1083931545, 1083931159, 1083931113, 1083931634, 1083930659, 1083931633, 1083930944, 1083931096, 1083931611, 1083930850, 1083931354, 1083931167, 1083931142, 1083930668, 1083930854, 1083930747, 1083931712, 1083930542, 1083930633, 1083930721, 1083931800, 1083931154, 1083931638, 1083931534, 1083931347, 1083931071, 1083931394, 1083931688]

sample entities with duplication {"Datafile"=> {"id"=>1083931817, "createId"=>"GUARDIAN", "createTime"=>"2015-07-20T22:22:57.298+01:00", "modId"=>"GUARDIAN", "modTime"=>"2015-07-20T22:22:57.298+01:00", "dataCollectionDatafiles"=>[], "datafileCreateTime"=>"2015-07-20T22:14:49.000+01:00", "datafileModTime"=>"2015-07-20T22:14:49.000+01:00", "description"=>"unknown", "destDatafiles"=>[], "fileSize"=>6230661, "location"=> "/dls/i04/data/2015/nt11175-56/pasdomain/pasdomain_M8S8_2_1350.cbf", "name"=>"pasdomain_M8S8_2_1350.cbf", "parameters"=>[], "sourceDatafiles"=>[]}}

duplication count: 2

{"Datafile"=> {"id"=>1083931614, "createId"=>"GUARDIAN", "createTime"=>"2015-07-20T22:22:56.716+01:00", "modId"=>"GUARDIAN", "modTime"=>"2015-07-20T22:22:56.716+01:00", "dataCollectionDatafiles"=>[], "datafileCreateTime"=>"2015-07-20T22:14:49.000+01:00", "datafileModTime"=>"2015-07-20T22:14:49.000+01:00", "description"=>"unknown", "destDatafiles"=>[], "fileSize"=>6230643, "location"=> "/dls/i04/data/2015/nt11175-56/pasdomain/pasdomain_M8S8_2_1349.cbf", "name"=>"pasdomain_M8S8_2_1349.cbf", "parameters"=>[], "sourceDatafiles"=>[]}}

duplication count: 2

jodysalt commented 7 years ago

This is still an issue in production.

https://helpdesk.gridpp.rl.ac.uk:443/Ticket/Display.html?id=174864

fisherab commented 7 years ago

I have reproduced the bug - now I need to fix it

fisherab commented 7 years ago

In fact it is not a bug in ICAT. The problem is that the internal query (i.e. without the limit clause) does not provide a fully defined ordering so when used with different limit clauses does not work as expected. The solution I adopted was to change the order by from "ORDER BY df.datafileCreateTime desc" to "ORDER BY df.datafileCreateTime desc, id". This is still able to use the index on the datafileCreateTime. Note that this field is not generally indexed so consideration should be given to just using "order by id desc" which also works fine.