Open paulhectork opened 2 years ago
to send asynchronous responses from a flask server to a client: https://www.shanelynn.ie/asynchronous-updates-to-a-webpage-with-flask-and-socket-io/
When I (poorly) designed the function, if I recall, the idea was to perform the reconciliation upstream (on a regular basis for instance, or to re-conciliate again when more items would be added to the database), and to give access to the results of the reconciliation only.
In my mind it made no sense to trigger the function each time a request was send to the server.
Hope it will be of help !
Best of luck with this project,
Matthias
Edit: this would work if the authors are all pre-identified, of course (but I'm not sure that reconciliating on strings is really more interesting than reconciliating on identified entities). Another assumption was that the reconciliation had to be made on authors, because this type of data fitted best with the task (it would exclude the right amount of entries).
hi ! thanks for letting me know about the context in which the reconciliator was created ! from what I see on the app's code, the reconciliation function is basically the website's search engine. it's a fine solution in 99.9% of the the times and I only encountered this problem once.
I've already tried to perform the reconciliation on all authors upstream, but it took ~50 hours to perform, so I've abandoned it for now and am not sure I'll be able to do it at all.
so far, the reconciliation on demand is a fine working solution to a problem, and redoing the whole search engine would be quite a hassle. I was just cleaning up some stuff on the website and opened this small issue to come back to it later. the bug would be I hope very easy to fix.
best, Paul
issue description
what the title says. when a lot of candidates have been selected and need to be reconciliated, the reconciliation process takes exponentially long. this is a problem because
reconciliation()
is used in the search engine of the human readable website (in the/Search
route). the problem seems to be in thedouble_loop()
function. for example:Sévigné
, 52 items are matched for a reconciliation.double_loop()
takes ~1 second and the waiting time for/Search
is ~5 seconds long (which is fine).Napoléon
, 112 items are matched. reconciliation takes 4'' and the whole search takes rougly 7-10 seconds (a bit long, but still fine).bonaparte
, there are 740 matches.double_loop()
takes 5'19'' to be processed and the whole time taken for/Search
takes rougly as much time.although this problem occurs rarely (I noticed it after working on the website for months), it is unsuitable for a client-exposed function, because the client will virtually never wait 5+ minutes for a response. it also causes a server-side problem, because the application will continue to run the search even after the client has quit the page. this could cause strain on the app if there are several pending requests to process.
technical problem
the
/Search
route callsreconciliator()
to group different occurrences of the same manuscript together. this function works in two times:author_filtering()
and a date withdate_fitering()
). it matches a certain number of items that need to be reconciliated. the number of items matched don't impact the processing time.double_loop()
is called to group matched items. this is what takes so much time.double_loop()
loops over all matched items, and, for each of those items, loops once more over all matched items. so ifx
items are matched, the total number of iterations isx^2
.solutions
i've thought of two possible solutions:
View by manuscript
button).to be continued...