thequbit / mayan-document-listener

A bridge between the mayan-edms and BarkingOwl
GNU General Public License v3.0
1 stars 0 forks source link

setup mongodb for document cache #8

Open thequbit opened 10 years ago

thequbit commented 10 years ago

example payload:

payload = {
    'command': 'found_doc',
    'source_id': self.uid,
    'destination_id': 'broadcast',
    'message':  {
        'doc_url': http://timduffy.me/document.pdf',
        'link_text': 'some document',
        'url_data': {
            'target_url': "http://timduffy.me/",
            'title': "TimDuffy.Me",
            'description': "Tim Duffy's Personal Website",
            'max_link_level': 3,
            'creation_datetime': '2014-07-17 21:34:18',
            'doc_type': 'application/pdf',
            'frequency': 2,
            'allowed_domains': [],
        },
        'scrape_datetime': '2014-07-17 21:34:17',
    }
}

Mongodb is a great fit since the data is 'schemaless'.

thequbit commented 10 years ago

This has been implemented in a very lite wrapper in db_api.py.

Example of actual document broadcast:

{
    u'source_id': u'b9f5e0a8-2390-43ae-8ef1-77651c6b3d7c',
    u'message': {
        u'doc_url': u'http: //timduffy.me/Resume-TimDuffy-20130813.pdf',
        u'link_text': u'Resume',
        u'url_data': {
            u'status': u'running',
            u'doc_type': u'application/pdf',
            u'start_datetime': u'2014-07-1815: 34: 25',
            u'target_url': u'http: //timduffy.me/',
            u'max_link_level': 3,
            u'description': u"Tim Duffy's Personal Website",
            u'title': u'TimDuffy.Me',
            u'runs': [

            ],
            u'scraper_id': u'b9f5e0a8-2390-43ae-8ef1-77651c6b3d7c',
            u'frequency': 2,
            u'finish_datetime': u'',
            u'creation_datetime': u'2014-07-1815: 34: 25',
            u'allowed_domains': [

            ]
        },
        u'scrape_datetime': u'2014-07-1815: 34: 25'
    },
    u'command': u'found_doc',
    u'destination_id': u'broadcast'
}
thequbit commented 10 years ago

Response from db_api.get_one_not_uploaded_document() :

{
    u'uploaded': False,
    u'doc_url': u'http: //timduffy.me/Resume-TimDuffy-20130813.pdf',
    u'url_data': {
        u'status': u'running',
        u'doc_type': u'application/pdf',
        u'start_datetime': u'2014-07-1815: 32: 34',
        u'target_url': u'http: //timduffy.me/',
        u'max_link_level': 3,
        u'description': u"Tim Duffy's Personal Website",
        u'title': u'TimDuffy.Me',
        u'runs': [

        ],
        u'scraper_id': u'692127e0-9d3a-4f99-ae39-f206e2a32f75',
        u'frequency': 2,
        u'finish_datetime': u'',
        u'creation_datetime': u'2014-07-1815: 32: 34',
        u'allowed_domains': [

        ]
    },
    u'link_text': u'Resume',
    u'scrape_datetime': u'2014-07-1815: 32: 35',
    u'insert_datetime': u'2014-07-1815: 32: 35.075349',
    u'source_id': u'692127e0-9d3a-4f99-ae39-f206e2a32f75',
    u'_id': ObjectId('53c97653a70f9e356ba0df44')
}