Issue 473 Add optional debug/info logging to extract_xri script

sillsdev / silnlp

A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.

Other

35 stars 3 forks source link

Issue 473 Add optional debug/info logging to extract_xri script #514

Closed rminsil closed 2 months ago

rminsil commented 2 months ago

This PR adds optional logging to the script.

Currently the script makes certain assumptions about the data which can make it a bit brittle causing it to crash mid-processing.

Some example assumptions:

all columns exist with the expected schema
no embedded newlines (e.g. each row is on one line)
id's can all be parsed to int

Logging will help to quickly identify the row of data breaking it.

Currently the logging is off by default. Enabling with -log [PATH] sends the logs to the file specified.

Note this PR is built on top of another unmerged PR #473. I'll rebase this and target master when #473 is merged.

This change is

rminsil commented 2 months ago

In silnlp, we always have logging enabled and we output directly to stdout. I don't think there is a need for logging to a file.

Each line of the file produces some logs and the files are around 4K entires, so you're typically getting about 8K-12K lines of logging, that's why I thought it better to have it off by default and send it to a file if it's enabled. Otherwise it's like a snowstorm in the terminal and it will usually overrun the terminal's buffer. We can pipe it to a file I guess but that's an extra step.

I was going to reserve stdout or stderr for the more meaningful output like statistics and summaries of data quality issues noticed.

Let me know what you think, I'm happy to do whatever.

UPDATE: Sorry I think I misunderstood some of what you're saying and was missing some context.

rminsil commented 2 months ago

@ddaspit I found the logging config in the parent __init.py__ and understand what you mean now.

I've removed the option to log to a file.

I've changed the logger name to be specific to the module. I'll make an analogous change to the repair logger used in #515.

I've added an optional flag to allow setting the debug level from the command line.