ripple / rippled-historical-database

SQL database as a canonical source of historical data
99 stars 68 forks source link

GET balance_changes: using "currency" and "counterparty" fields cause result to fail #129

Closed professorhantzen closed 5 years ago

professorhantzen commented 7 years ago

This works, and returns JSON that contains another marker:

https://data.ripple.com/v2/accounts/rJnZ4YHCUsHvQu7R6mZohevKJDHFzVD6Zr/balance_changes?descending=false&format=json&counterparty=rMwjYedjc7qqtKYVLiAccJSmCwih4LnE2q&currency=BTC&limit=1000&marker=rJnZ4YHCUsHvQu7R6mZohevKJDHFzVD6Zr|20150624151110|000014241813|00007|00006

The marker returned from the above request is:

rJnZ4YHCUsHvQu7R6mZohevKJDHFzVD6Zr|20150824124530|000015455303|00029|00005

The subsequent request, containing the new marker:

https://data.ripple.com/v2/accounts/rJnZ4YHCUsHvQu7R6mZohevKJDHFzVD6Zr/balance_changes?descending=false&format=json&counterparty=rMwjYedjc7qqtKYVLiAccJSmCwih4LnE2q&currency=BTC&limit=1000&marker=rJnZ4YHCUsHvQu7R6mZohevKJDHFzVD6Zr|20150824124530|000015455303|00029|00005

Always fails with the error:

Error 503 first byte timeout
first byte timeout
Guru Mediation:
Details: cache-fra1244-FRA 1482794656 662294927
Varnish cache server

I've experimented, and the error seems to occur when the counterparty and currency fields are supplied in the request, and not when they are left out. Eg, I tried the entire set of marker traversals (1000's of pages) for the above query - sans those fields - and didn't hit the error.

shekenahglory commented 7 years ago

I bumped up the timeout a bit, but this query (with counterparty and currency filters) is going to be slow on accounts with large numbers of transactions, its probably better to reduce the limit

professorhantzen commented 7 years ago

Thanks, that's working for me now. As such I've now noticed that my above example - that failed previously - was actually the last page in the paginated set, in this case returning 125 balance_changes (given a limit of 1000). It's a small number of results, but they span from the start time (of the marker) to almost the present time, thus I'm assuming the timeout is due to the DB having to perform the filter on a very large data set.

Could be a way to eliminate this error? If a query is valid according to the defined limits of the API, to me it should either return a valid result, or at least error at the API level. What do you think?

shekenahglory commented 7 years ago

It should be an API level error now, as the db query has its own timeout which is shorter than the cache layer timeout. I don't really see a way to eliminate the error altogether, without creating more tables specific for the query. The indexing is based on the date, so if you knew an absolute date range you could restrict to, maybe that would help?

professorhantzen commented 7 years ago

How about a two-fold approach to catch and deal with every error of this type.

1) When the db query hits a timeout, an error informs the user a timeout has been reached and a reminder of the existence of: 2) A boolean that can be set on each query that indicates a willingness to wait indefinitely on a result. When this flag is set, remove the db timeout for that query.

Could you foresee any problems with such an approach? (Would the potential for spamming the server with time-consuming queries with no timeout be an issue for instance?)