Open georgeslabreche opened 7 years ago
Model should be easily serializable into JSON.
Why this is a requirement?
Model needs to be splittable into subtasks? Maybe via an annotation mechanism?
I wonder how automated this might/have to be. I see two options: 1) No automation. Just presenter's full form generation and then it's up to front-dev to split that up in multiple forms/tasks. 2) Annotation mechanism and form generation that would generate several forms out of one complex model. I'm thinking that it would be best to move verified data into structured relational db tables. Could then such annotations help to aggregate the data?
Model needs to be tied to validation/verification mechanism.
Agree. Through annotations?
You are right, JSON serialization is not necessary.
For splitting tasks, we need:
I think annotations is the way to go.
Here's a different approach we might want to put some thought into: what if we completely dismiss backend for modeling and instead document how front-end task presenter can inherit most of this complexity by using tools like grunt that would eventually just stick everything as a single html/css/javascript task presenter template file?
Splitting and merging tasks has to be done on the backend, as differents tasks go to different people at different time. This complexity cannot be packed into standalone html+js.
In my opinion it would be more clear if we have a filesystem structure like:
forms/
choose_page.html # on which table with companies exists
company_names.html # leave transcribing details to others
company_details.html # transcribe company x's details on page z
Then a document importer could be written that would create tasks of specific type, for example choose_page
.
Then a verification mechanisms could be written that would implement algorithm:
choose_page
task is verified create a task company_names
with metadata of verified page.company_names
task is verified create multiple tasks company_details
each associated with different companyThat relates to #83 already.
Re merging:
I agree with the file system structure, each having their own model defined. Maybe we can define minimal best practice implementation where each model need to inherit from a base model.
In regards to the verification mechanisms: if I understand correctly we want the subtasks to be ordered and served sequentially, serving only the next task if the previous one has been verified? If this is the case, we have to "complexify" the status setting/tracking for the different subtasks so that a subtask serving order is enforced.
Merging: Where would this structured storage reside? Are we trying to fit it in the current schema or it would be a new table? Or a separate persistance mechanism like mongo?
Maybe we can define minimal best practice implementation where each model need to inherit from a base model.
+1
If this is the case, we have to "complexify" the status setting/tracking for the different subtasks so that a subtask serving order is enforced.
Let's discuss it on the call. I see two issues:
Sequential tasks #51 were defined as:
by sequential (sub)tasks I understand tasks that can be filled independently, but they are related to each other in a logical way so it makes sense for a user to follow the sequence. Example: one imported document which is a form is split in multiple tasks by sections. If users fills out one section he may wish to continue with that document or try another one. So it may be beneficial to have a button "Transcribe next section"/"Continue with this document" and "Pick a random task".
This is very nice-to-have
Merging: Where would this structured storage reside? Are we trying to fit it in the current schema or it would be a new table? Or a separate persistance mechanism like mongo?
General assumption: It should be independent from PyBossa core, probably defined in the "project" plugin. So new tables. Should it be different schema/db? I don't have a preference. Mongo or sql? I was thinking about structured SQL storage, because then you can autogenerate API + API spec from that (I'm going too far?) and ask for specific entities. For example if you have political donations sheets you want to have the concept both of the report where they are entered, political parties and donors. That's why I'm pro relational db for this data.
Let's have that call.
This issue is now connected mostly with subtasking and verification mechanism.
We need to document that properly.
ACCEPTANCE CRITERIA