molybdenum-99 / reality

Comprehensive data proxy to knowledge about real world
MIT License
817 stars 43 forks source link

Add timeout for List.load! #66

Closed djsmentya closed 7 years ago

djsmentya commented 7 years ago

Hi I think that Reality::List#load! method need a timeout mechanism. I tried to execute code: Reality::Entity('USA').adm_divisions.load! and it took loots of time and didn't get result after 10 minutes. What do you think about this?

zverok commented 7 years ago

Hey. Thanks for your contribution! I have some notices, which are pretty important, though. First part is about style/quality:

  1. You haven't added any specs for your PR, how we can be sure it doesn't break anything/isn't broken itself? (Travis says, it is)
  2. Look objectively at this code: cities.load!(3) -- what would be your first guess it does?.. Mine would be "loads first 3 cities" or something like that, definitely not about timeout. For this feature, I'd rather expect something like load!(timeout: 3). But...

More important, I have conceptual problems with your approach, and there are numerous.

First of all, the simplest examples for unaware users are broken now. Just try something like Entity('Ukraine').cities.load!. Previously it "just worked" (in 74 seconds, which is not THAT bad for 462 entries, don't you think?). Now it "just doesn't". An implicit default of 3 sec timeout is too low for most Reality's tasks. And, in fact, any particular timeout could be unacceptable, depending on situation (like Reality.countries.flat_map(&:cities).load! -- I don't know whether it will work at all, but for all I know, it probably could).

Now, let's think about problem you are trying to solve. As far as I can understand, there are two:

  1. There is no way to know "where we are" (e.g. what amount of data is already fetched);
  2. There is no way to know, whether something is "stuck" (some data source is not responding, or wikipedia parser is in infinite loop, or something like that).

So, generic "timeout above everything" is typically not what we want. What we really do, is:

  1. Atomic timeouts for different parts of process. Network timeouts for every network request (configurable either for each component, or globally, somehow); probably, parsing timeout for infoboxer (wikipedia parser);
  2. Probably, some "progress" API, that could be used to inject progress-bar implementation into Reality's processes (and in interactive console, to show some progress-bars)

If you are interested, in your particular case it looks like a bug in infoboxer (infinite loop while parsing Wyoming page), I am on it.

djsmentya commented 7 years ago

@zverok Thanks for your reply. I just wanted to start discussion and show my idea somehow, I Didn't want to invest lots of time if it's wrong proposal.

zverok commented 7 years ago

BTW, it was bug in Wyoming page, I've fixed it and now Wikipedia shows page correctly (previously it had missing header) AND reality works again.

But of course infoboxer should've NOT go into infinite loop even on hardest markup bugs :( And an idea of more controlled (timeouts & progress-bars) loading is really worth implementing.

zverok commented 7 years ago

Thanking to you, Infoboxer is fixed for this rare infinite loop too (in the latest commit to master). So, your report was extremely useful, all in all :)