grambank / pygrambank

Apache License 2.0
4 stars 1 forks source link

merge warnings and errors #117

Closed HedvigS closed 6 months ago

HedvigS commented 6 months ago

can we merge warnings and errors for pygrambank check?

value without source and "comment contains string ""check""" are warnings, the others are errors. Warnings spit out the Feature_ID separate from the message, errors merge them. For me keeping up with quality-control, these can all be treated the same.

I think we should treat these as the same, both as errors please.

For the CLDF-release, I'd prefer to suppress all data-points that trigger an error, save the general sheet error "Less than 25 datapoints have comments"

HedvigS commented 6 months ago

@johenglisch said this over here: https://github.com/glottobank/Grambank/issues/2722

I think the WARNING label still makes sense for making some of the lowest-priority checks sound appropriately low-priority (‘comment contains string "check"‘, ‘less than 25 comments‘).

But yeah, we can elevate some of the warnings to errors if you feel people should always address them.

EDIT: Here's a list of all warnings from the check script:

  • ‘value without source’
  • ‘source given, but no value’
  • ‘comment contains string "check"’
  • ‘comment given, but no value’
  • ‘non-empty cell with empty header’
  • ‘less than 25 datapoints have comments’

I think that all of should be resolved before merging new sheets and that the following should mean suppression for cldfbench:

johenglisch commented 6 months ago

‘comment contains string "check"’

Sure about that one? Because iirc this was just a hint that a data point might be worth checking (also, that hint is bound to have a few false positives, since there are numerous reasons why one would have the word check in a comment).

The other three were already used to filter data points and I only had to change the prefix in the logging.

HedvigS commented 6 months ago

‘comment contains string "check"’

Sure about that one? Because iirc this was just a hint that a data point might be worth checking (also, that hint is bound to have a few false positives, since there are numerous reasons why one would have the word check in a comment).

The other three were already used to filter data points and I only had to change the prefix in the logging.

I have been thinking about this. I've decided that when a comment with this warning is reviewed we should change the comment so that it reflects the situation and no longer contains the word "check". I think it's not good to have these type of comments, and there are only a handful of them left so I think we should surpress them.

johenglisch commented 6 months ago

okay, I changed the 'check' thing to an error

HedvigS commented 6 months ago

okay, I changed the 'check' thing to an error

🙏🙏🙏