Closed yulqen closed 3 years ago
Prompted by https://github.com/banillie/bcompiler-engine/commit/26b63b6e630e2925175ffff6b48b42d70f7ba544, this issue is to discuss the implementation of cell value type checking in bcompiler-engine.
bcompiler-engine
currently accommdates a "type" column in the datamap which allows a user to specify the expected data type in the cell being targeted, to allow for type-checking to be carried out during the import
phase. The underlying JSON structure that is extracted from the spreadsheet contains a "type" attribute (I will add details of the location of the code, etc later) which reflects basic types currently allowed in the datamap, such as TEXT
, DATE
and NUMBER
(again, I will clarify later).
The objective of this feature is to implement type-checking for the user, therefore the following questions needs to be answered:
DATE
type, you might want to restrict dates before or after a certain date; strings less than a certain length, numbers within a range. Such considerations go beyond simple type-checking into rule-creation. If kept simple at first, this is a useful potential addition that should be provided for early on.datamaps
to switch the feature on or off (e.g. -t
or --type-checking
) for exampleimport
, templates
, etc.There will be other concerns which I will update here!
Nice list of questions Matt. lets discuss them next week. I think it will be good to throw some of these out to the team also, to see what they want.
have sent to the team and asked them to specify requirements for 1-5, but we probably need to discuss 6-10 first. do you want to have a specific time for this, or do it as part of our weekly coding session?
Useful to get the team's view on this. I think their input on 1
, 4
and 5
are most important - the other things are more for us from a development point of view. Happy to discuss on here and during our weekly session. 6
and 7
are key and I need to lead on that, and happy to do so. The best approach would be for me to try to have a look at the code before our session and re-familiarise myself with the structure and come up with some preferred options, and then explain in this issue, pointing at code if possible. When we meet, we can discuss the options. If you're looking at the code in the meantime, feel free to pitch anything into this thread as you go on, including questions, ideas, suggestions, etc.
Types are coded as DatamapLineValueType
objects, which is an Enum, from here:
Data extracted from a spreadsheet is wrapped in a TemplateCell
object, one per cell. TemplateCell
is defined here:
Types are assigned to values when being extracted from the spreadsheet, here:
which happens inside a function, template_reader
, called as a multiprocess from here:
template_reader
is tested here:
which is a good spot to put the debugger to examine the data
variable, which is a dictionary containing data from a single sheet.
and here:
@banillie - This is now almost ready for v1.1.0 which is coming out very soon. As ever, would appreciate a bit of testing and feedback!
@banillie - v1.1.0 of datamaps has been put out. See the CHANGELOG for details.
--rowlimit
and --inputdir
)-v
)@yulqen going to test it now. Will let you know.
Update:
datamaps import templates runs well. row limit set to 500 automatically.
datamaps import templates -m --rowlimit 300
worked
datamaps import templates -v
told me I needed to have a type column. Which my dm.cvs doesn't, so working
have some issue with the --inputdir argument as below
datamaps import templates -m -d ../Q3_dm_import.csv --inputdir ~/Documents/datamaps/
19-Jan-21 10:12:42: INFO - Row limit is set to 500.
19-Jan-21 10:12:42: CRITICAL - Cannot find /home/will/Documents/datamaps/../Q3_dm_import.csv
the -d
and --inputdir
arguments seem to be conflicting with each other. probably because I'm not entering properly.
This is now fixed in datamaps v1.1.1.
https://github.com/banillie/bcompiler-engine/commit/26b63b6e630e2925175ffff6b48b42d70f7ba544