Introduce asynchronous fetching of HiPS tiles

adl1995 commented 7 years ago

As the title states, HiPS tiles are now fetched asynchronously. I noted 4 seconds for tile fetching which previously took around 25 seconds at my end. For now, this is introduced in the HipsPainter class, but let me know if this needs to be refactored somewhere else.

cdeil commented 7 years ago

@adl1995 - Thanks!

Please put this on the list of things to discuss in the telcon this afternoon.

For sure, this code shouldn't be located in hips/draw/paint.py.

As discussed since the start of the project (and started by you in #33), the tile fetching / caching should be in it's own file. We've discussed it many times, but it hasn't happened so far, we should really make it a priority for the last week.

One reason that it's difficult is that (I think) there's a fundamental decision to be made whether we are happy with two phases that happen after each other (first fetch all tiles, then do all drawing), or whether we want to support async tile fetching and immediately and async tile drawing as the tiles come it. That second option is faster, and also much more memory-efficient (can delete tiles that are drawn from memory immediately), but also leads to tighter coupling between the fetching / caching and drawing code. My suggestion for now (and I think has always been) would be that we implement the simple solution and move the tile fetching out to a separate Python file, and have it as much as possible decoupled from the drawing, i.e. the drawer just gives a list of tiles needed to the fetcher and is handed back that list of tiles, i.e. there's two lines of code in the painter calling the fetcher.

The second question / suggestion I have is whether introducing aiohttp as a dependency is really needed, or whether an async or at least parallel via threads fetch wouldn't work for our simple application and we would avoid the extra dependency. There's e.g. this: https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example

@adl1995 - I know you have looked at this before GSoC started. Please remind us of the results this afternoon and then we discuss.

Also: if we go the simple route of separating out the fetching, then we could simply have two different fetchers, one using aiohttp as an optional dependency and the other just the Python standard lib to fetch, and the user could select via a simple option which one to use (probably defaulting to the Python standard lib, and only offering aiohttp as opt-in for power-users that really need the extra little bit of performance.

adl1995 commented 7 years ago

@cdeil I have added asynchronous tile fetching using both urllib and aiohttp, however, the test fails at: https://travis-ci.org/hipspy/hips/jobs/265656774

I think this is due to the reason that tiles are not fetched sequentially and when they are matched with HipsTileMeta, they are misaligned. https://github.com/hipspy/hips/pull/106/files#diff-27bdb9fa08176e9f36aa86205b78af6eR71

What do you think is the best solution to this? Should I somehow label each tile, so I can sort them once they are fetched?

cdeil commented 7 years ago

This PR has changes to astropy_helpers which shouldn't be here. Please fix.

cdeil commented 7 years ago

Concerning your question about the current bug that tilemeta / tile isn't matched correctly: I don't know.

Please add a unit test that fails and address the other points mentinoed above, and then I'll have a look.

If you have an idea how to fix the issue, just go for it and then I'll review.

adl1995 commented 7 years ago

@cdeil I've made the updates, however, I still don't know how to enforce order when fetching tiles, so the tests fail.

Also, can you please explain how I could remove changes from astropy_helpers?

cdeil commented 7 years ago

Also, can you please explain how I could remove changes from astropy_helpers?

Here's one way to do it:

cd astropy_helpers
git checkout 14ca346  # This is the version hips master points to
cd ..
git status
git add -u
git commit -m 'Undo change to astropy_helpers'

Then to avoid introducing a messy git history where the astropy_helpers version jumps back and forward again, I'd suggest to use

git fetch origin   # might be called upstream for you, I don't know
git rebase -i origin/master

and to declare this new commit a fix-up commit on top of 05a130 which is the commit where you introduced the accidental change:

pick 6c12958 Introduce asynchronous fetching of HiPS tiles
pick 0eba105 Add option to fetch tiles asynchronously using urllib
pick 05a130f Add tqdm progress bar to urllib and aiohttp fetch methods
fixup 295434b Undo change to astropy_helpers

Finally, git push -f the branch to Github and check the list of commits and diff there. Should be all good now.

You're probably aware, but just in case: get into the habit of always using git status and git diff to review changes before committing, to spare yourself (and me) the troubles of having to fix up your git history later.

cdeil commented 7 years ago

I would suggest that the HipsTileFetcher is given a Python list of HipsTileMeta of tiles to fetch. That is simpler and more flexible compared to what you have now. It then also means that you can create the list of HipsTile that you return more easily, because you already have the meta. For the async code, can't you have a system that makes it easy to attach the right data to the right meta? Maybe one option could be that you pass the meta for each tile to the async function and that function hands back the complete tile, no?

adl1995 commented 7 years ago

@cdeil Your suggestion on coupling HipsTileMeta with the fetch method worked, thanks!

I think this is ready to be merged now. Handling the fetch errors and missing tiles could be a separate PR.

coveralls commented 7 years ago

Coverage increased (+0.3%) to 96.742% when pulling 472e3b9a87cd0c43338642fd7c23e90604203657 on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

coveralls commented 7 years ago

Coverage increased (+0.3%) to 96.742% when pulling 2349fcf13f1696d04134ce880db7a9f641db5b0a on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

cdeil commented 7 years ago

One more thing: I think it would be good to have at least two or three sentences of description in the fetcher saying what it does and at least one example how it works. Note that this is code-wise by far the most complex piece in our package (using threads and async and requests) and soon we will have to extend it with error handling in some way and I think it would be really valuable to have a short high-level description what it does and how it works for future developers.

Users will probably never use it directly. For them the options in make_sky_image matter. There you currently only expose the choice whether to use urllib or aiohttp, but not n_parallel or timeout. I think it would be especially important to expose timeout, because from experience I know that it can be super frustrating to have something time out if one a slow but working network, with no option to increase the timeout parameter. The options to expose it would be either individually, or as a dict "fetch_opts" where the user can pass e.g. fetch_opts=dict(package='urllib', timeout=30, n_parallel=2)

adl1995 commented 7 years ago

@cdeil I just made the updates. I have removed the HipsTileFetcher class and now the fetch.py file exports a fetch_tiles function. I'm not sure if this is neat, as now I had to pass around arguments that were previously accessed using self.

coveralls commented 7 years ago

Coverage decreased (-0.06%) to 96.372% when pulling 90d4a7d56309d06f6ad9b86ce988645b27631a6e on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

coveralls commented 7 years ago

Coverage decreased (-0.05%) to 96.377% when pulling 90d4a7d56309d06f6ad9b86ce988645b27631a6e on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

coveralls commented 7 years ago

Coverage decreased (-0.05%) to 96.377% when pulling c57bad602b08201722e51e11f2c0d3d37dd6dae0 on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

adl1995 commented 7 years ago

@cdeil I have refactored fetching tiles using urllib, now tiles_urllib function computes url on the fly and passes it to executor.submit. This works, but I have a feeling that tiles are not being fetched asynchronously, I say this because it took me 89 seconds for fetching tiles when building the docs.

coveralls commented 7 years ago

Coverage increased (+0.09%) to 96.517% when pulling 54313eef75dceef5596d1acbbc63bcf61f94a483 on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

adl1995 commented 7 years ago

@cdeil The example should be fixed now, there was a leading , after the url, so it caused errors. The tile fetching using urllib is now being done asynchronously.

coveralls commented 7 years ago

Coverage increased (+0.1%) to 96.527% when pulling cafc0d14a337f575d87bb4be6a88ced8e020dd91 on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

coveralls commented 7 years ago

Coverage increased (+0.1%) to 96.527% when pulling cafc0d14a337f575d87bb4be6a88ced8e020dd91 on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

cdeil commented 7 years ago

@adl1995 - Thanks!

We could merge now if you like, but I have one remaining question / concern regarding how the async and thread parallel fetch is implemented

in the thread solution, you manage to have the if progress_bar up front, but in the async done differently. Is there a way to make the thread and async implementation more similar or is that not possible because with async / await it's not possible to do it equivalently?
another issue is that with the async solution, you have two different ways of awaiting the results, which means twice as much testing is needed. This is mostly the same comment, on whether it's possible to make the progress bar handling work the same for threads and async.
and finally, a question for the thread solution: does it keep running 5 threads in the background, or does the number of parallel fetches go down because of the blocking .result() call, i.e. if a given tile is slow to fetch , but later ones are fast.

@adl1995 - If you're interested in fully understanding threads and async and what your code is doing (or maybe even writing another blog post), you could spend some time and use e.g. https://github.com/astrofrog/psrecord to "record' and plot or print a log of what is going on. I don't know if that can monitor number of threads and network I/O, but probably https://pythonhosted.org/psutil/ or some other tools can, and you could set of a "benchmark", i.e. a small Python file that executes a few fetches and / or draws and then makes a printout or plot summarising what is going on. This is something I wanted to do before implementing the faster tile drawing, but I won't have time for it. So if you're interested, take a day or two max for this "side project" and do that.

adl1995 commented 7 years ago

@cdeil There is an option to move the progress bar in the beginning in case of async, but it makes the code synchronous. The modified function is this:

async def fetch_all_tiles_aiohttp(tile_metas: List[HipsTileMeta],
                                  hips_survey : HipsSurveyProperties, progress_bar: bool) -> List[HipsTile]:
    """Generator function to fetch HiPS tiles from a remote URL using aiohttp."""
    import aiohttp

    async with aiohttp.ClientSession() as session:
        if progress_bar:
            from tqdm import tqdm
            tile_metas = tqdm(tile_metas, total=len(tile_metas), desc='Fetching tiles')

        tiles = []
        for meta in tile_metas:
            url = hips_survey.tile_url(meta)
            task = asyncio.ensure_future(fetch_tile_aiohttp(url, meta, session))
            tiles.append(await task)
    return tiles

adl1995 commented 7 years ago

@cdeil I made the updates as discussed in the call. However, there is a small problem. When using asyncio.as_completed the tiles are not fetched in sequence, this is no issue during drawing, but in test_fetch.py I assert on the first tile, which can be at a different index each time.

So, for now, I just fetched a single tile and asserted on that. I found this question on SO, but did not try to implement this, because this would create an extra overhead which isn't exactly needed, except for the test_fetch.py file.

coveralls commented 7 years ago

Coverage decreased (-0.3%) to 96.098% when pulling d27168bce994e772302636facb30fd7482c82023 on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

cdeil commented 7 years ago

@adl1995 - I've done one commit 1fd6e7f to (hopefully) improve the code a little bit. It's minor changes, mostly to avoid very long lines. One change is that I managed to get rid of the no-op "progress_bar", by simply wrapping an iterable of futures with the tqdm call, again returning an iterable. I think / hope it's still working as before.

One thing I noticed is that for the async, you haven't actually implemented the n_parallel and timeout options. Can you please add it? Both are really important for the caller to have control over.

adl1995 commented 7 years ago

@cdeil Can you please tell how I could pull in your commit? I messed up my Git history.

cdeil commented 7 years ago

The branch on Github is OK and contains my commit: https://github.com/hipspy/hips/pull/106/commits

You could just delete your local branch, and check out a new local branch from this latest remove version, and then continue coding there, no?

adl1995 commented 7 years ago

@cdeil I have added timeout and n_parallel option to tiles_aiohttp function. I'm not sure why you removed asyncio.as_completed, doesn't this mean that if one tile takes too long to fetch, others have to wait?

adl1995 commented 7 years ago

@cdeil The tests fail with:

ImportError: /home/travis/miniconda/envs/test/lib/python3.6/lib-dynload/../../libz.so.1: version `ZLIB_1.2.9' not found (required by /home/travis/miniconda/envs/test/lib/python3.6/site-packages/matplotlib/../../.././libpng16.so.16)

I'm not sure what causes this.

cdeil commented 7 years ago

I have added timeout and n_parallel option to tiles_aiohttp function.

Thanks!

I'm not sure why you removed asyncio.as_completed, doesn't this mean that if one tile takes too long to fetch, others have to wait?

I don't think I did remove it, just moved it up a few lines, no?

The tests fail ...

This looks like a Linux install issue of our dependencies, not like something related to this PR. I've restarted travis-ci to see if it was a fluke.

coveralls commented 7 years ago

Coverage increased (+0.3%) to 96.733% when pulling d7979afd03c5376c4cd1216e6b20c5e2ee3b7627 on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

cdeil commented 7 years ago

travis-ci passed, it was a fluke.

I've done a final review and some small improvements in b4805ad :

Fixed some type annotations
Change default of n_parallel to 5, because I don't think that a higher value improves total fetch time, and if so, then more parallel requests in flight just means higher chance for one of them to time out.
Remove defaults for helper functions, to avoid duplicating the default values in different parts of the code (hard to keep consistent)
Use aiohttp.TcpConnector because I think it's a bit simpler then the asyncio.Semaphore (one less level of code indentation, and more powerful (has other options to control requests that we might want to use in the future).

Merging this now, and then I'll make a issue with my thoughts how to improve fetch.py further for you @adl1995 in a follow-up PR.

coveralls commented 7 years ago

Coverage increased (+0.3%) to 96.733% when pulling b4805ad2b31082496c8fcb41677ce469db0f0417 on adl1995:async_tile_fetch into 6462a4829cf9aad987e00e4bd5e917a714e78c1b on hipspy:master.

hipspy / hips

Introduce asynchronous fetching of HiPS tiles #106