opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
255 stars 69 forks source link

failed tasks while.. on my python script. #106

Closed hpiedcoq closed 4 years ago

hpiedcoq commented 4 years ago

Hello,

Digging into the creation of my own scripts, I want to extract some IOCs from my documents, using the iocextract library. The lib is installed for both python2.7 and 3.6 on my machine. Python default is 3.6.

My code is called : enhance_extract_sha256.py in/usr/lib/python3/dist-packages/opensemanticetl/:

import etl
import iocextract

class enhance_extract_sha256(object):
    def process(self, parameters=None, data=None):
        if parameters is None:
            parameters = {}
        if data is None:
            data = {}

        # todo: use all data fields for analysis
        text = ''
    if 'content_txt' in data:
            text = data['content_txt']

        for sha256_ex in iocextract.extract_256_hashes(text):
            etl.append(data, 'sha256_ss', sha256_ex)

    return parameters, data

I declared in etl:

# Enable IOCs plugin
config['plugins'].append('enhance_extract_sha256')

But it doesn't work. sample page .

Any idea on what I'm missing?

Thank you. H.

hpiedcoq commented 4 years ago

Indentation problem in my code (space vs tab).