gtalarico / pyairtable

Python Api Client for Airtable
https://pyairtable.readthedocs.io
MIT License
765 stars 138 forks source link

batch_update type unclear #198

Closed NicoHood closed 1 year ago

NicoHood commented 1 year ago

Hi, I am talking about this code:

    def _batch_update(
        self,
        base_id: str,
        table_name: str,
        records: List[dict],
        replace=False,
        typecast=False,
        **options
    ):
        updated_records = []
        table_url = self.get_table_url(base_id, table_name)
        method = "put" if replace else "patch"
        params = self._options_to_params(**options)
        for records in self._chunk(records, self.MAX_RECORDS_PER_REQUEST):
            chunk_records = [{"id": x["id"], "fields": x["fields"]} for x in records]
            response = self._request(
                method,
                table_url,
                json_data={"records": chunk_records, "typecast": typecast},
                params=params
            )
            updated_records += response["records"]
            time.sleep(self.API_LIMIT)

        return updated_records

The type of records should be List[dict] but it is not clear what the dict should contain. Also I find this confusing, why is this syntax used? Wouldnt it be better to use a type of dict[str, dict] where str is the id and dict the fields? That would be the syntax of the normal update operation. That List feels very unnatural.

gtalarico commented 1 year ago

is not clear what the dict should contain

Should be a list of dict "records", representing records previously retrieved and modified. https://pyairtable.readthedocs.io/en/latest/api.html#pyairtable.api.Api.batch_update

image

If you think it's helpful we can add an example to docs. Could be something like this. This will batch update all records to set field processed to True

records = table.all()
for record in records:
    record["processed"] = True
table.batch_update(records)

Wouldnt it be better to use a type of dict[str, dict]

That would be incorrect since dict values vary by tables schema (could be string, int, etc) If we wanted a slightly more specific types we could do Dict[str, Any] but I am not sure would help much

NicoHood commented 1 year ago

I think you have quotes your own comment.

dict[str, dict] is correct because this rolls out as:

{ # dict
  "recordid1": {"field": value}, # str: dict (where dict is dict[str: Any])
  "recordid2": {"field": value}
}

Maybe the inner dict could be more specific as you said: dict[str, dict[str, Any]]

That would be better than having fixed dict key names

gtalarico commented 1 year ago

@NicoHood I think my example was slightly off because I forgot fields are nested but dict[str, dict] is not correct. It's a list of records, where record is a dict that contains key "id" and "fields", and "createdTime". See airtable docs.

image

I think records: List[dict] is ok. the actual type is List[Dict[str, Union[str, Dict]]. We could create a type alias and call this Record type but not sure how much that would help.

records = [
 { "id": "<str>", "fields": <dict[str, any]>", "createdAt": <str> },
  { "id": "<str>", "fields": <dict[str, any]>", "createdAt": <str> },
]
NicoHood commented 1 year ago

Yeah but I dont like this. This is very specific, why not use the id as dict key and the fields as value? I am talking about the function parameter, not the return value. You cannot input the created date anyways, so this feels more natural.

The single update function also does not take a dict, it takes and id and a dict of values. My suggestion follow the same syntax, just those two placed in a dict with key/value.

gtalarico commented 1 year ago

There are 3 reasons to keep it a list:

  1. Matches structure of "records" response when you fetch. This Allows you to fetch, mutate records, and then pass them back to the batch update without needing to restructure the format (as in this example https://github.com/gtalarico/pyairtable/issues/198#issuecomment-1287984672)

  2. It aligns well with airtable api interface. This means it's easier for people to jump between api docs an the library (as seen in screenshot above)

  3. It does break the current api to existing users of this library

NicoHood commented 1 year ago
  1. That is not really an argument to me. You get a wonky dict that you normally need to convert internally anyways. There is no better way yet, except the ORM
  2. We want to abstract the interface. If we want to copy it 1:1 then we can use the requests library
  3. That is true, we need to bump the major version. But it makes sense to me, I also had library where I did not make everything perfect in the first, but in the long run. We could even detect the input and convert it to be backward compatible, even though that might be a bit ugly. but it all works, as python is a scripting language.
gtalarico commented 1 year ago

I understand your points and appreciate your @NicoHood , but I am happy with the current api.

Record/records as a format with id/fields is used consistently as input and output across many endpoints and I would like to keep that consistent.

This seems like a minor preference between two formats and would lean towards api consistency and not introduce breaking api changes.

If this really bothers you you can create a sub class that implements the method signature of your choice or a helper function to convert between formats.

NicoHood commented 1 year ago

Maybe we could ask the other contributers for more feedback @bjlange @daneasterman @mesozoic @BenjaminLucier @robbieaverill @larsakerson

robbieaverill commented 1 year ago

I'm not sure why I was tagged, but I agree with the maintainer. If you want a different data structure, you can extend to implement it, or fork the package.

NicoHood commented 1 year ago

@mesozoic What is your opinion on this one?

mesozoic commented 1 year ago

I think I understand the point you're making, which is that a dict keyed on record IDs makes it slightly easier to keep track of the changes you're making. This happens in our team's own code from time to time, where we'll have something like:

changes: Dict[RecordId, Dict[FieldName, FieldValue]] = {}
# ...do things to populate the dict above...
# ...then:
payload = [
    {"id": record_id, "fields": fields}
    for (record_id, fields) in changes.items()
]
table.batch_update(records=payload)

However, I don't see any value to making a breaking change here. The function signature is consistent with the data structures that the Airtable API expects, and while your proposal might match your own internal data structures slightly better (and mine!), it won't add any new functionality (while breaking all existing code).

I'm going to close this suggestion, but we'd still welcome other ideas you have for how to improve the library.