opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
255 stars 69 forks source link

process() got multiple values for argument 'parameters' #81

Open ronniebrito opened 5 years ago

ronniebrito commented 5 years ago

Hi,

I'm developing a data enhancer plugin , as describer at

but the following error is thrown while indexing files

Exception while data enrichment of arquivos_indexados/a.xml with plugin teste: process() got multiple values for argument 'parameters'

what is going wrong here? is there another way? the code is bellow

regards

import re

class teste(object):
 def process(parameters={}, data={} ):
  # regular expression matching email-adresses
  regex = r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'

  # facet / column where to store/index it
  facet = "URL_ss"

  # find all emailadresses with a regular expression
  matches = re.findall(regex, parameters['text'])

  if matches:
   # add the list matches to the facet
   opensemanticsearch_connector.append(data, facet, matches)

  return parameters, data
ronniebrito commented 5 years ago

I noticed this might be a python related issue adding self to the process method signature solved the problem

import re

class teste(object):
 def process(self, parameters={}, data={} ):
  # regular expression matching email-adresses
  regex = r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'

  # facet / column where to store/index it
  facet = "URL_ss"

  # find all emailadresses with a regular expression
  matches = re.findall(regex, parameters['text'])

  if matches:
   # add the list matches to the facet
   opensemanticsearch_connector.append(data, facet, matches)

  return parameters, data