yulqen / bcompiler-engine

MOVED: A Python library to alleviate the pain of using Excel spreadsheets to collect data from your stakeholders.
https://git.sr.ht/~yulqen/bcompiler-engine
MIT License
4 stars 1 forks source link

Implementing cell value type checking #29

Closed yulqen closed 3 years ago

yulqen commented 3 years ago

https://github.com/banillie/bcompiler-engine/commit/26b63b6e630e2925175ffff6b48b42d70f7ba544

yulqen commented 3 years ago

Prompted by https://github.com/banillie/bcompiler-engine/commit/26b63b6e630e2925175ffff6b48b42d70f7ba544, this issue is to discuss the implementation of cell value type checking in bcompiler-engine.

bcompiler-engine currently accommdates a "type" column in the datamap which allows a user to specify the expected data type in the cell being targeted, to allow for type-checking to be carried out during the import phase. The underlying JSON structure that is extracted from the spreadsheet contains a "type" attribute (I will add details of the location of the code, etc later) which reflects basic types currently allowed in the datamap, such as TEXT, DATE and NUMBER (again, I will clarify later).

The objective of this feature is to implement type-checking for the user, therefore the following questions needs to be answered:

  1. What types should defined? How complex should they be? For instance, for a DATE type, you might want to restrict dates before or after a certain date; strings less than a certain length, numbers within a range. Such considerations go beyond simple type-checking into rule-creation. If kept simple at first, this is a useful potential addition that should be provided for early on.
  2. Use of a flag in datamaps to switch the feature on or off (e.g. -t or --type-checking) for example
  3. A consideration of "strictness" - is there needed a flag which stops the program upon finding an "error", or should the program try to silently or verbosely correct the error?
  4. Which datamaps phase this applies to import, templates, etc.
  5. The format of the report the user will consume (separate text-based report, entries in logging, highlighting inside master, other).
  6. How to treat/create the data structure that best serves the report, taking into account extensibility to other interfaces that may arise in future, such as a web view.
  7. Maintain good separation of concerns within the current program. Is there an object currently in use which best facilitates this reporting or do we need a new adapter/interface?
  8. Test-first.
  9. Documentation.
  10. Version (1.1?)

There will be other concerns which I will update here!

banillie commented 3 years ago

Nice list of questions Matt. lets discuss them next week. I think it will be good to throw some of these out to the team also, to see what they want.

banillie commented 3 years ago

have sent to the team and asked them to specify requirements for 1-5, but we probably need to discuss 6-10 first. do you want to have a specific time for this, or do it as part of our weekly coding session?

yulqen commented 3 years ago

Useful to get the team's view on this. I think their input on 1, 4 and 5 are most important - the other things are more for us from a development point of view. Happy to discuss on here and during our weekly session. 6 and 7 are key and I need to lead on that, and happy to do so. The best approach would be for me to try to have a look at the code before our session and re-familiarise myself with the structure and come up with some preferred options, and then explain in this issue, pointing at code if possible. When we meet, we can discuss the options. If you're looking at the code in the meantime, feel free to pitch anything into this thread as you go on, including questions, ideas, suggestions, etc.

yulqen commented 3 years ago

Current type validation code inside bcompiler-engine - main objects and functions

Implementation Code

Types are coded as DatamapLineValueType objects, which is an Enum, from here:

https://github.com/yulqen/bcompiler-engine/blob/254f306a6e251739db2925b4b0cc7602bf58ccc0/engine/domain/datamap.py#L11-L20

Data extracted from a spreadsheet is wrapped in a TemplateCell object, one per cell. TemplateCell is defined here:

https://github.com/yulqen/bcompiler-engine/blob/cb9e6fb87d4c8f15542f9d784c07feff84d2bea6/engine/domain/template.py#L7-L32

Types are assigned to values when being extracted from the spreadsheet, here:

https://github.com/yulqen/bcompiler-engine/blob/254f306a6e251739db2925b4b0cc7602bf58ccc0/engine/utils/extraction.py#L437-L490

which happens inside a function, template_reader, called as a multiprocess from here:

https://github.com/yulqen/bcompiler-engine/blob/6b88864516ef1cbe4a7bc3ed6753840492cfef9f/engine/use_cases/parsing.py#L227-L233

Relevant tests

template_reader is tested here:

https://github.com/yulqen/bcompiler-engine/blob/6b88864516ef1cbe4a7bc3ed6753840492cfef9f/tests/test_datamap_parser.py#L60-L79

which is a good spot to put the debugger to examine the data variable, which is a dictionary containing data from a single sheet.

and here:

https://github.com/yulqen/bcompiler-engine/blob/6b88864516ef1cbe4a7bc3ed6753840492cfef9f/tests/use_cases/test_master_from_org_templates.py#L32-L42

yulqen commented 3 years ago

@banillie - This is now almost ready for v1.1.0 which is coming out very soon. As ever, would appreciate a bit of testing and feedback!

yulqen commented 3 years ago

@banillie - v1.1.0 of datamaps has been put out. See the CHANGELOG for details.

banillie commented 3 years ago

@yulqen going to test it now. Will let you know.

Update: datamaps import templates runs well. row limit set to 500 automatically. datamaps import templates -m --rowlimit 300 worked datamaps import templates -v told me I needed to have a type column. Which my dm.cvs doesn't, so working

have some issue with the --inputdir argument as below

datamaps import templates -m -d ../Q3_dm_import.csv --inputdir ~/Documents/datamaps/ 19-Jan-21 10:12:42: INFO - Row limit is set to 500. 19-Jan-21 10:12:42: CRITICAL - Cannot find /home/will/Documents/datamaps/../Q3_dm_import.csv

the -d and --inputdir arguments seem to be conflicting with each other. probably because I'm not entering properly.

yulqen commented 3 years ago

This is now fixed in datamaps v1.1.1.