grambank / pygrambank

Apache License 2.0
4 stars 1 forks source link

describe to report if content in contributed_datapoint fails to conform #53

Closed HedvigS closed 2 years ago

HedvigS commented 2 years ago

It would be great if the check segment of pygrambank describe checked if all the content in the column [C|c]ontributed_[D|d]atapoints" matched to a known coder abbreviation and if there is more than one person listed, check that they are separated by spaces and not commas or semicolons.

I found some misaligned comments sometimes in this col and inconsitent separation when more than one, would be neat to just check that while we're checking everything else for new submissions.

xrotwang commented 2 years ago

Individual coders are determined by looking for chunks of uppercase letters - regardless of separator: https://github.com/grambank/pygrambank/blob/f3e82e10d5fc39c5c64737dbafc158d7c11af905/src/pygrambank/sheet.py#L29-L31

The check against known coders is done in grambank describe: https://github.com/grambank/pygrambank/blob/f3e82e10d5fc39c5c64737dbafc158d7c11af905/src/pygrambank/commands/describe.py#L94-L98