themoonsheep / moonsheep

Moonsheep digitizes huge, messy paper and PDF archives through crowdsourcing and cutting edge technology.
http://moonsheep.org
GNU Affero General Public License v3.0
9 stars 3 forks source link

As a developer, I want technical instructions on where and how I should define my task presenter's model. #82

Open georgeslabreche opened 7 years ago

georgeslabreche commented 7 years ago

ACCEPTANCE CRITERIA

KrzysztofMadejski commented 7 years ago

Model should be easily serializable into JSON.

Why this is a requirement?

Model needs to be splittable into subtasks? Maybe via an annotation mechanism?

I wonder how automated this might/have to be. I see two options: 1) No automation. Just presenter's full form generation and then it's up to front-dev to split that up in multiple forms/tasks. 2) Annotation mechanism and form generation that would generate several forms out of one complex model. I'm thinking that it would be best to move verified data into structured relational db tables. Could then such annotations help to aggregate the data?

Model needs to be tied to validation/verification mechanism.

Agree. Through annotations?

georgeslabreche commented 7 years ago

You are right, JSON serialization is not necessary.

For splitting tasks, we need:

  1. Some sort of mechanism to figure out which aspect of the task presenter should be displayed depending on which data is requested
  2. Another mechanism to synchronized the split pieces in terms of dispatching tasks.
  3. Merging mechanism.

I think annotations is the way to go.

Here's a different approach we might want to put some thought into: what if we completely dismiss backend for modeling and instead document how front-end task presenter can inherit most of this complexity by using tools like grunt that would eventually just stick everything as a single html/css/javascript task presenter template file?

KrzysztofMadejski commented 7 years ago

Splitting and merging tasks has to be done on the backend, as differents tasks go to different people at different time. This complexity cannot be packed into standalone html+js.

In my opinion it would be more clear if we have a filesystem structure like:

forms/
  choose_page.html  # on which table with companies exists
  company_names.html  # leave transcribing details to others
  company_details.html # transcribe company x's details on page z

Then a document importer could be written that would create tasks of specific type, for example choose_page.

Then a verification mechanisms could be written that would implement algorithm:

That relates to #83 already.

Re merging:

georgeslabreche commented 7 years ago

I agree with the file system structure, each having their own model defined. Maybe we can define minimal best practice implementation where each model need to inherit from a base model.

In regards to the verification mechanisms: if I understand correctly we want the subtasks to be ordered and served sequentially, serving only the next task if the previous one has been verified? If this is the case, we have to "complexify" the status setting/tracking for the different subtasks so that a subtask serving order is enforced.

Merging: Where would this structured storage reside? Are we trying to fit it in the current schema or it would be a new table? Or a separate persistance mechanism like mongo?

KrzysztofMadejski commented 7 years ago

Maybe we can define minimal best practice implementation where each model need to inherit from a base model.

+1

If this is the case, we have to "complexify" the status setting/tracking for the different subtasks so that a subtask serving order is enforced.

Let's discuss it on the call. I see two issues:

  1. Hierarchical tasks (child can only be served after parent is verified) - in this case I would create child tasks on the event of parent verification. Then we don't need any more statuses apart from open&complete.
  2. Sequential tasks #51 were defined as:

    by sequential (sub)tasks I understand tasks that can be filled independently, but they are related to each other in a logical way so it makes sense for a user to follow the sequence. Example: one imported document which is a form is split in multiple tasks by sections. If users fills out one section he may wish to continue with that document or try another one. So it may be beneficial to have a button "Transcribe next section"/"Continue with this document" and "Pick a random task".

    This is very nice-to-have

Merging: Where would this structured storage reside? Are we trying to fit it in the current schema or it would be a new table? Or a separate persistance mechanism like mongo?

General assumption: It should be independent from PyBossa core, probably defined in the "project" plugin. So new tables. Should it be different schema/db? I don't have a preference. Mongo or sql? I was thinking about structured SQL storage, because then you can autogenerate API + API spec from that (I'm going too far?) and ask for specific entities. For example if you have political donations sheets you want to have the concept both of the report where they are entered, political parties and donors. That's why I'm pro relational db for this data.

georgeslabreche commented 7 years ago

Let's have that call.

KrzysztofMadejski commented 7 years ago

This issue is now connected mostly with subtasking and verification mechanism.

KrzysztofMadejski commented 6 years ago

We need to document that properly.