Closed georgeslabreche closed 6 years ago
List out all the validation we want to support.
https://github.com/transparencee/moonsheep/issues/73
Figure out how to reconcile with PyBossa's redundancy mechanism. Maybe the redundancy is just maximum of entries that tells the system when to to give up on validating with the rules.
That's a neat idea. Such fuzzy tasks could be marked for inspection by a moderator. Nevertheless it would be good to have a default redundancy for verified fields, not to repeat that number in all of the validation rules. I would keep PyBossa's redundancy limit as this default value for number of verified entries and introduce a new limit for maximum number of entries.
Thinking about redundancy for specific model fields - it may be too hard on performance. Maybe defining custom redundancy limit per task would be enough?
Validation is made every time data is persisted.
I'd go for that option, but queue such requests in a cron/ongoing job not to kill the database.
Do we define a validation interface that needs to be implemented in the back-end for every model & task importer plugin?
I'd do it like this. See https://github.com/transparencee/moonsheep/issues/73#issuecomment-308158913: Have a pluggable interface for verification rules
The algorithm for completing a task run would be:
to_be_verified
queueVerification loop handling to_be_verified
queue:
The con with this approach [cron] is that task processing completion won't only depend on how many people are submitting tasks but also on how often the cronjob runs.
You can make cron run quite often [5mins? 1min?], in one thread and not quit until it processed the whole queue. To have an continuous job sort of things. The questions is do we want to parallelize validation? I guess not in an MVP.
You and I need to experiment with PyBossa'a cron architecture. I had a really hard time with it when I tried to extend. Plus it didn't seem to like short period crons. I eventually abandoned that quest so it concerns me having to revisit it. The off the shelf crons have always been problematic for me, requiring restarts from time to time.
Ckan (Python based) uses Celery tasks with Redis storage, continuous work ensured by supervisor. Works like a charm in a Polish instance. See this wiki for an installation plus configuration templates.
I can revisit PyBossa crons, just create a task and assign it to me.
Will do. In the immediacy you can snope around those sched.py and jobs.py files I linked in the issue description. PyBossa also uses Redis for caching.
To note: I've heard people recommending rabbitMQ as a message broker and PythonRQ as a asynchronous task leader instead of celery.
Pybossa's built in validation is purely based on redundancy, we need to implemented support for custom validations. For instance, the ability to indicate that some fields need to have equal values inputted at least n times.
TODO:
TECHNICAL NOTE: I only see two ways of implementing this: