zytedata / python-zyte-api

Python client for Zyte API
BSD 3-Clause "New" or "Revised" License
21 stars 5 forks source link

Add a client-side alternative to echoData #15

Open Gallaecio opened 2 years ago

Gallaecio commented 2 years ago

While working on documentation for https://docs.zyte.com/ about setting request metadata, I am starting to think that maybe we should not send echoData to the server, and instead keep track of it on the client side.

I get the usefulness of echoData for generic HTTP clients that have no request metadata tracking feature, but a client with that feature, or one made specifically for Zyte Data API, should probably keep track of request metadata on the client side. In fact, maybe https://github.com/scrapy-plugins/scrapy-zyte-api should discourage its use altogether, in favor of Request.meta.

This is of course based solely on the current implementation of Zyte Data API. It is possible that future features will make it worthwhile to include echoData (e.g. some web UI that allows to keep track of your requests, where getting access to echoData would be useful).

Thoughts?

kmike commented 2 years ago

This makes sense to me, in general. It seems echoData is not used in scrapy-zyte-api; are you thinking about documentation changes? Here echoData is used in the CLI tool, so that the output contains source URL by default. What alternative do you see?

lopuhin commented 2 years ago

maybe we should not send echoData to the server, and instead keep track of it on the client side.

If it's still called echoData in the client, then it can be surprising if it's not sent to the server IMO for someone aware of the HTTP API, e.g. if you're doing some debugging.

Gallaecio commented 2 years ago

It seems echoData is not used in scrapy-zyte-api; are you thinking about documentation changes?

Yes. In the docs.zyte.com content I am preparing about echoData, in the code example tab set, we could cover how to use echoData in Scrapy but explain how Request.meta may be a better choice. I would also remove echoData from the example in https://github.com/scrapy-plugins/scrapy-zyte-api/blob/main/README.rst#how-to-use.

If it's still called echoData in the client, then it can be surprising if it's not sent to the server IMO for someone aware of the HTTP API, e.g. if you're doing some debugging.

Yes, that is a very good point.

For python-zyte-api, maybe we should leave things as they are for now, but in the future, if there is demand for it, add an AsyncClient parameter and command-line interface option to track echoData on the client side. I don’t expect demand for it any time soon, but it may be a good option for people hitting the API request length limit because of their echoData content.

kmike commented 2 years ago

I wonder if it'd be better to have this feature, but not have it tied to echoData. Or, alternatively, make echoData handling optional.

Gallaecio commented 2 years ago

I wonder if it'd be better to have this feature, but not have it tied to echoData.

Makes a lot of sense. In the scenario where echoData causes the API request to be too long, users could move some data from echoData to a client-specific field (e.g. clientData, requestMetadata), and still keep some data in echoData (in case there is a benefit in the future to have data there beyond getting it back in the client).