tomasvotava / tap-airtable

Singer SDK tap for Airtable
MIT License
2 stars 3 forks source link

Type error on field #21

Open lukevers opened 3 months ago

lukevers commented 3 months ago

Hey,

I was testing this out and ran into an issue. I have a field in airtable that is a formula and returns a number. It looks like it's mad it's not a string (and was not cast to a string):

Screenshot 2024-07-29 at 2 29 43 PM

Here are my logs:

root@docker-desktop:/project# meltano run tap-airtable target-jsonl
2024-07-29T18:25:46.303044Z [info     ] Environment 'dev' is active   
2024-07-29T18:25:47.103235Z [warning  ] No state was found, complete import.
2024-07-29T18:25:49.473064Z [info     ] 2024-07-29 18:25:49,472 | INFO     | tap-airtable.active_journeys | Beginning full_table sync of 'active_journeys'... cmd_type=elb consumer=False name=tap-airtable producer=True stdio=stderr string_id=tap-airtable
2024-07-29T18:25:49.473868Z [info     ] 2024-07-29 18:25:49,472 | INFO     | tap-airtable.active_journeys | Tap has custom mapper. Using 1 provided map(s). cmd_type=elb consumer=False name=tap-airtable producer=True stdio=stderr string_id=tap-airtable
2024-07-29T18:25:49.915153Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.915836Z [info     ]   File "/project/.meltano/loaders/target-jsonl/venv/bin/target-jsonl", line 8, in <module> cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.916610Z [info     ]     sys.exit(main())           cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.917122Z [info     ]   File "/project/.meltano/loaders/target-jsonl/venv/lib/python3.10/site-packages/target_jsonl.py", line 92, in main cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.917589Z [info     ]     state = persist_messages(  cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.917983Z [info     ]   File "/project/.meltano/loaders/target-jsonl/venv/lib/python3.10/site-packages/target_jsonl.py", line 54, in persist_messages cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.919601Z [info     ]     validators[o['stream']].validate((o['record'])) cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.920150Z [info     ]   File "/project/.meltano/loaders/target-jsonl/venv/lib/python3.10/site-packages/jsonschema/validators.py", line 130, in validate cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.920646Z [info     ]     raise error                cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.921338Z [info     ] jsonschema.exceptions.ValidationError: 6 is not of type 'string', 'null' cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.921995Z [info     ]                                cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.922677Z [info     ] Failed validating 'type' in schema['properties']['days_idle']: cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.923281Z [info     ]     {'type': ['string', 'null']} cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.923736Z [info     ]                                cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.924801Z [info     ] On instance['days_idle']:      cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.925646Z [info     ]     6                          cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2024-07-29T18:25:49.949321Z [error    ] Loader failed                 
2024-07-29T18:25:49.949836Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False
lukevers commented 3 months ago

So I was able to get through that specific error by making some changes to the types:

AirtableOneOfType = th.OneOf(
    th.StringType,
    th.NumberType,
    th.BooleanType,
    th.DateTimeType,
    th.DateType,
)

AirtableAnyType = th.OneOf(
    AirtableOneOfType,
    th.ArrayType(AirtableOneOfType),
)

Then in AIRTABLE_TO_SINGER_MAPPING I updated:

    "formula": AirtableAnyType,

I then had the same error on "lookup" so did that too:

    "lookup": AirtableAnyType,

Still having some problems, but getting somewhere.

If you take a look at the typescript types for fields, it's pretty chaotic -- and basically in Airtable it does seem like a lot of these fields can be things other than strings (in my case, formulas and lookups were actually numbers):

export interface FieldSet {
    [key: string]: undefined | string | number | boolean | Collaborator | ReadonlyArray<Collaborator> | ReadonlyArray<string> | ReadonlyArray<Attachment>;
}

I think the solution here is either:

  1. Convert field types to the same or similar structure as their typescript sdk types (might have a python one? I haven't looked)
  2. Cast these values to strings and keep them as a string

Or something else haha.

lukevers commented 3 months ago

I ended up making a few more changes to get things working on my end.

  1. In the config schema I added a new field for specific base->table->column fields to exclude
    th.Property(
    "exclude",
    th.ObjectType(
        additional_properties=th.ObjectType(
            additional_properties=th.ArrayType(th.StringType)
        )
    ),
    description="Exclude fields from specific tables in bases",
    required=False,
    )

which looks like this (the slugified version of the column name):

config:
  exclude:
    base_id:
      table_id:
        - field_name1
        - field_name2

and then I had to continue to keep making changes in types.py, these are the updates I ended up making:

AirtableCollaborator = th.ObjectType(
    th.Property("id", th.StringType),
    th.Property("email", th.StringType),
    th.Property("name", th.StringType),
    th.Property("permissionLevel", th.StringType),
    th.Property("profilePicUrl", th.StringType),
)

AirtableButtonType = th.ObjectType(
    th.Property("label", th.StringType),
    th.Property("url", th.StringType),
)

AirtableOneOfType = th.OneOf(
    th.StringType,
    th.NumberType,
    th.BooleanType,
    th.DateTimeType,
    th.DateType,
    th.ArrayType(th.StringType),
    th.ArrayType(th.NumberType),
    th.ArrayType(th.BooleanType),
    th.ArrayType(th.DateTimeType),
    th.ArrayType(th.DateType),
)

AirtableAnyType = th.OneOf(
    AirtableOneOfType,
    th.ArrayType(AirtableOneOfType),
)

AIRTABLE_TO_SINGER_MAPPING: dict[str, Any] = {
    "singleLineText": th.StringType,
    "email": th.StringType,
    "url": th.StringType,
    "multilineText": th.StringType,
    "number": th.NumberType,
    "percent": th.OneOf(th.StringType, th.NumberType),
    "currency": th.OneOf(th.StringType, th.NumberType),
    "singleSelect": th.StringType,
    "multipleSelects": th.ArrayType(th.StringType),
    "singleCollaborator": AirtableCollaborator,
    "multipleCollaborators": th.ArrayType(AirtableCollaborator),
    "multipleRecordLinks": th.ArrayType(AirtableAnyType),
    "date": th.DateType,
    "dateTime": th.DateTimeType,
    "phoneNumber": th.StringType,
    "multipleAttachments": th.ArrayType(AirtableAttachment),
    "checkbox": th.BooleanType,
    "formula": AirtableAnyType,
    "createdTime": th.DateTimeType,
    "rollup": AirtableAnyType,
    "count": AirtableAnyType,
    "lookup": AirtableAnyType,
    "multipleLookupValues": th.ArrayType(AirtableOneOfType),
    "autoNumber": th.OneOf(th.StringType, th.NumberType),
    "barcode": th.StringType,
    "rating": th.StringType,
    "richText": th.StringType,
    "duration": th.StringType,
    "lastModifiedTime": th.DateTimeType,
    "button": AirtableButtonType,
    "createdBy": AirtableCollaborator,
    "lastModifiedBy": th.StringType,
    "externalSyncSource": th.StringType,
    "aiText": th.StringType,
}

I know this is quite a bit of changes, so I won't directly open a PR right now. Happy to open a PR if anyone else runs into these problems.

tomasvotava commented 2 months ago

Hey @lukevers, thanks a lot for posting this!

TBH I haven't tried anything with formulas yet, so it's just natural it doesn't work out of the box and I'm glad you've found this issue. Since, if I understand correctly, formula can output almost anything, would it be sufficient to type formulas as Any type in JSON schema?