singer-io / tap-mongodb

GNU Affero General Public License v3.0
28 stars 38 forks source link

KeyError: '_id' CRITICAL ('Exception writing records', KeyError('_id')) #48

Open waleedarshad-vf opened 4 years ago

waleedarshad-vf commented 4 years ago

Mongo to redshift. I got this error what can be the possible reasons

RROR Exception writing records
Traceback (most recent call last):
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 300, in write_batch
    {'version': target_table_version})
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py", line 808, in write_batch_helper
    for table_batch in denest.to_table_batches(schema, key_properties, records):
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 21, in to_table_batches
    key_properties)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 47, in _get_streamed_table_schemas
    key_prop_schemas[key] = schema['properties'][key]
KeyError: '_id'
CRITICAL ('Exception writing records', KeyError('_id'))
Traceback (most recent call last):
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 300, in write_batch
    {'version': target_table_version})
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py", line 808, in write_batch_helper
    for table_batch in denest.to_table_batches(schema, key_properties, records):
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 21, in to_table_batches
    key_properties)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 47, in _get_streamed_table_schemas
    key_prop_schemas[key] = schema['properties'][key]
KeyError: '_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/macbookpro/.virtualenvs/target-redshift/bin/target-redshift", line 10, in <module>
    sys.exit(cli())
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 55, in cli
    main(args.config)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 49, in main
    target_tools.main(redshift_target)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 28, in main
    stream_to_target(input_stream, target, config=config)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
    raise e
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 64, in stream_to_target
    line
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 152, in _line_handler
    state_tracker.flush_stream(line_data['stream'])
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/stream_tracker.py", line 41, in flush_stream
    self._write_batch_and_update_watermarks(stream)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/stream_tracker.py", line 67, in _write_batch_and_update_watermarks
    self.target.write_batch(stream_buffer)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/redshift.py", line 72, in write_batch
    return PostgresTarget.write_batch(self, nullable_stream_buffer)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 309, in write_batch
    raise PostgresError(message, ex)
target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('_id'))
waleedarshad-vf commented 4 years ago

@dmosorast again your help is need

waleedarshad-vf commented 4 years ago

@all I have also added detailed reference here. https://stackoverflow.com/questions/62793871/keyerror-id-critical-exception-writing-records-keyerror-id-singer-i

bonysimon007 commented 3 years ago

@waleedarshad-vf Did you find the solution?

waleedarshad-vf commented 3 years ago

@bonysimon007 No, We moved to stitchdata which is the enterprised version and worked

JackMcKew commented 3 years ago

From the looks of this article, my best guess is it's probably stemming from the schema not being specified in the catalog.json

https://titanwolf.org/Network/Articles/Article?AID=b4eb664e-70f4-430c-b568-bdd0d32bcf35#gsc.tab=0

JackMcKew commented 3 years ago

I managed to get around this by using https://github.com/transferwise/pipelinewise-tap-mongodb and https://github.com/transferwise/pipelinewise-target-postgres

I wrote some Python code to edit the catalog that comes out after tap-mongodb --config config/singer/tap-mongodb-config.json --discover > catalog.json

import json
import pathlib
input_file = 'catalog.json'
output_file = 'mongo/catalog.json'

with open(input_file) as f:
    data = json.load(f)
    for stream in data['streams']:
            stream_metadata = stream['metadata'][0]
            stream_metadata['metadata']['selected'] = True
            stream_metadata['metadata']['replication-method'] = 'FULL_TABLE'

            # https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md
            # https://github.com/singer-io/tap-mongodb/issues/48

            stream['schema']['properties'] = {
                "_id": {
                        "type": [
                            "string",
                            "null"
                        ]
                    },
                "document": {
                        "type": [
                            "object",
                            "array",
                            "string",
                            "null"
                        ]
                    }
            }

    pathlib.Path(output_file).parent.mkdir(parents=True, exist_ok=True)
    with open(output_file, 'w') as json_file:
        json.dump(data, json_file)

Which will sync the entire document into a single JSONB column in PostgreSQL.

If you are then looking to expand the JSONB column into a full table in PostgreSQL, this can be done in SQL statements. Although if using camelCase for document keys in MongoDB then this causes funny behaviour due to snake_case in PostgreSQL. I was able to define the function given here: https://rudism.com/converting-json-keys-to-snake-case-in-postgresql/.

CREATE TEMPORARY TABLE snake_case (document jsonb);

INSERT INTO snake_case SELECT json_keys_to_snake_case FROM <SCHEMA_NAME>.<TABLE_NAME> m, json_keys_to_snake_case(m.document);

INSERT INTO <SCHEMA_NAME>.<TABLE_NAME> (
    _id,
    document_key1,
    document_key2
)
SELECT 
    t."_id",
    t."document_key1",
    t."document_key2"
FROM
    snake_case m
    CROSS JOIN LATERAL
    jsonb_to_record(m.document) as t(
        _id TEXT,
        document_key1 <DATA_TYPE>,
        document_key2 <DATA_TYPE>
        );

DROP TABLE IF EXISTS snake_case;

Where is the schema the target-postgres sent the data to, is the table the data is in and document_key1 and document_key2 are example keys from the MongoDB collection