peterwebster / henson

Master data store for the Hensley Henson Journals project, and issue tracker. The application code is kept elsewhere.
1 stars 1 forks source link

Logging during ingest #237

Open peterwebster opened 5 years ago

peterwebster commented 5 years ago

If ingesting more than a dozen or so XML files, the connection often times out or (more seriously) there is the general 500 message. At the minute, the logs show if a line is skipped, and also prints the counts of lines read etc once it has finished. If, however, it fails part way through, it doesn't log the fact, and so it isn't clear to the user if and when to start a retry. Think about fuller logging of ingest events, especially as (I think) resque doesn't show ingesting, just processes subsequent to it.

Also, when a line is skipped, it would be useful to know why in the logs.

nomoregrapes commented 5 years ago

@peterwebster I've added a 2-part feature which is now on test_server...

On the logs page there is now a "Live Detailed Logs" section. The intention is that you open this in a new tab/window (or ideally a 2nd physical monitor) and change the drop down. If you leave this running for a few hours then your computer might grind, just close or refresh the log page.

In another tab, do an ingest. Currently only setup for references. There is a new option "Logging level" which can be set to "Detailed". I don't think it will actually slow down annotation ingest as much as I fear. You still need to keep the page open until it finishes loading and tells you the ingest summary. But you can switch to the tab that is showing the log.

I haven't actually added loads of lines of logging. It will have a log per row in the CSV, which can be an indication of how much it's progressing, and it will give a reason if it skips a row. I'm not sure how well this will work. It might be more beneficial when I add logging to XML ingest.

peterwebster commented 5 years ago

A few notes on this after running it with data for volume 17:

(i) sometimes the application prints to the dashboard an outdated message: in one case, '60 read, 5 skipped' appeared on the dashboard but a check of the logs found another log a second later saying '5 rows read, o skipped'. I'm not quite clear in what order things are happening.,

(ii) for some reason, it sometimes logs 'skipping 12-34-56 because we're in insert-only mode' either after or before reading the same row. I assume it means that these lines are an indication of something else.

(iii) in general, in the live detailed logging, you sometimes get a bunch of entries (I think those that would be in the summary version' that appear out of chron order in the detailed flow. I saw what was happening but it wasn't clear straight away.