Closed arieljlira closed 3 years ago
Se agregó la expresión "date math" +30DAYS en la consulta a SOLR cuando no se especifica un untilDate
Hice una prueba con esta version 0.5.5 y un tamaño de página de solr de 1000 rows y los resultados son muy buenos. Ejemplo
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T17%3A22%3A55.298Z%22+TO+2020-11-16T17%3A22%3A55.298Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 30.213 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T17%3A36%3A12.999Z%22+TO+2020-11-16T17%3A36%3A12.999Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 0.607 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T17%3A49%3A26.524Z%22+TO+2020-11-16T17%3A49%3A26.524Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 1.095 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T18%3A02%3A21.081Z%22+TO+2020-11-16T18%3A02%3A21.081Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 3.070 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T18%3A15%3A50.009Z%22+TO+2020-11-16T18%3A15%3A50.009Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 3.275 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T18%3A29%3A31.618Z%22+TO+2020-11-16T18%3A29%3A31.618Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 0.616 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T18%3A42%3A38.118Z%22+TO+2020-11-16T18%3A42%3A38.118Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 2.484 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T18%3A56%3A08.805Z%22+TO+2020-11-16T18%3A56%3A08.805Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 0.725 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T19%3A10%3A09.69Z%22+TO+2020-11-16T19%3A10%3A09.69Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 3.156 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T19%3A24%3A52.742Z%22+TO+2020-11-16T19%3A24%3A52.742Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 3.522 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T19%3A38%3A46.337Z%22+TO+2020-11-16T19%3A38%3A46.337Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=0' (get) with body '' in 0.583 seconds, with status 200
INFO: DSpace Stats Collector finished processing 9539 events from 2020-11-16T17:22:55.298Z to 2020-11-16T19:38:46.337Z. Breakdown: 9531 events sent succesfully, 8 events discarted as robot
Es decir, de un promedio de 13 segundos por request que tenía antes ahora bajamos a casi 1.9 (numeros calculados en el aire). Ahora creo otro ticket para subir el rows size de solr :D Muchas gracias!
Gracias a vos, gracias al caso de SEDICI y tu propuesta de mejora ahora el software es mucho más robusto. Estaremos atentos por el otro issue. Lautaro Matas
El lun, 18 oct 2021 a las 16:06, Ariel Lira @.***>) escribió:
Hice una prueba con esta version 0.5.5 y un tamaño de página de solr de 1000 rows y los resultados son muy buenos. Ejemplo
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T17%3A22%3A55.298Z%22+TO+2020-11-16T17%3A22%3A55.298Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 30.213 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T17%3A36%3A12.999Z%22+TO+2020-11-16T17%3A36%3A12.999Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 0.607 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T17%3A49%3A26.524Z%22+TO+2020-11-16T17%3A49%3A26.524Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 1.095 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T18%3A02%3A21.081Z%22+TO+2020-11-16T18%3A02%3A21.081Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 3.070 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T18%3A15%3A50.009Z%22+TO+2020-11-16T18%3A15%3A50.009Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 3.275 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T18%3A29%3A31.618Z%22+TO+2020-11-16T18%3A29%3A31.618Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 0.616 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T18%3A42%3A38.118Z%22+TO+2020-11-16T18%3A42%3A38.118Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 2.484 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T18%3A56%3A08.805Z%22+TO+2020-11-16T18%3A56%3A08.805Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 0.725 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T19%3A10%3A09.69Z%22+TO+2020-11-16T19%3A10%3A09.69Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 3.156 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T19%3A24%3A52.742Z%22+TO+2020-11-16T19%3A24%3A52.742Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=1000' (get) with body '' in 3.522 seconds, with status 200
INFO: Finished 'http://solr_url/statistics/select/?q=%2A+%2Btime%3A%7B%222020-11-16T19%3A38%3A46.337Z%22+TO+2020-11-16T19%3A38%3A46.337Z%2B30DAYS%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=0' (get) with body '' in 0.583 seconds, with status 200
INFO: DSpace Stats Collector finished processing 9539 events from 2020-11-16T17:22:55.298Z to 2020-11-16T19:38:46.337Z. Breakdown: 9531 events sent succesfully, 8 events discarted as robot
Es decir, de un promedio de 13 segundos por request que tenía antes ahora bajamos a casi 1.9 (numeros calculados en el aire). Ahora creo otro ticket para subir el rows size de solr :D Muchas gracias!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lareferencia/dspace-stats-collector/issues/14#issuecomment-945808755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH7G6TBUDGQLY62R4MHBDTUHQSURANCNFSM5FL2Y6DA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Estimados, las consultas que se hacen a solr son del tipo
q=time:$INICIO TO *&sort:time asc&rows=$COUNT
lo que hace que solr antes de devolver los resultados deba ordenar todos los resultados que coincidan (que pueden ser muchos millones) y retornar los primeros $COUNT items. En nuestro caso el total son casi 18.000.000 que deben ser ordenados y de los cuales sólo usamos los primeros 1000.Creo que si en lugar de hacer un
q=time:$INICIO TO *:*
se hiciese unq=time:$INICIO TO $INICIO+30DAYS
o, si esto no funciona, unq=time:$INICIO TO $FIN
dónde $FIN=$INICIO +30 días. De esta forma la lógica de la aplicación seguiria igual pero las consultas serían mucho mas livianas, ya que ordenarían menos de 100.000 items (en nuestro caso).A continuación incluyo un log ejemplo de consultas consecutivas en una misma ejecución del script. La primera demora mucho más, y las que siguen tardan bastante pero mucho menos que la primera.