hipspy / hips

Python library to handle HiPS
https://hips.readthedocs.io
12 stars 16 forks source link

Make improvements to parallel tile fetching #108

Open adl1995 opened 6 years ago

adl1995 commented 6 years ago

This is a follow-up for issue #107. I have restructured the fetch_tiles function for now, but error handling if yet to be done.

adl1995 commented 6 years ago

@cdeil I have successfully used ThreadPoolExecutor.map function, this simplifies the code quite a bit. But, I haven't been able to find an equivalent function for aiohttp.

cdeil commented 6 years ago

@adl1995 - OK to merge?

Or did you already continue doing edits in this branch?

adl1995 commented 6 years ago

@cdeil No, I haven't made any further edits. But, tile fetching only works correctly with urllib, not with aiohttp. The second test case in test_fetch.py is failing because the tiles aren't sorted correctly: https://travis-ci.org/hipspy/hips/jobs/270430193#L1557

Should I just manually sort the list of tiles in case of aiohttp?

adl1995 commented 5 years ago

@cdeil - Sorry for the late response.

I have added a some error handling for missing and timed out tiles.

For the moment, if a tile is missing, we simply raise a urllib.error.HTTPError exception. However, should we provide the functionality to draw tiles even if some of them are missing?

I think one option is to introduce an attribute (boolean?) in the HipsTile class which identifies if it's a dummy tile. Based on this we can return an np.zeros array when the drawing module queries the tile data.

cdeil commented 5 years ago

For the moment, if a tile is missing, we simply raise a urllib.error.HTTPError exception. However, should we provide the functionality to draw tiles even if some of them are missing?

I think it's very common that some times don't arrive. There's of course the case of network issues, but also there are HiPS that aren't all-sky, so some tiles just aren't present on the server. We should be able to draw those, yes. This is also the case for all HiPS we have as example datasets in the hips-extra repo, so maybe some tests will fail already if you raise HTTPError.

I don't remember what we have. But basically I think we need to return a result object, working just with a stream of tiles can't work?

adl1995 commented 5 years ago

This is also the case for all HiPS we have as example datasets in the hips-extra repo, so maybe some tests will fail already if you raise HTTPError.

I don't believe we currently handle the case of missing tiles. The tests seems to pass even if I raise the HTTPError for missing tiles. I also tried to remove one of the tiles from hips-extra that was used in a test case and the HipsTile.read() method raised an exception.

I'm starting to add this functionality in (https://github.com/hipspy/hips/pull/108/commits/0a17fe1262786915201f7a6ca537e305fa88d983). Currently, I have introduced a variable is_missing in HipsTile based on which the data property returns an array filled with zeros.

adl1995 commented 5 years ago

We now have the functionality to create an all sky image when one (or more) tiles are missing. An example plot for AKARI-FIS where some tiles (URLs) don't exist:

akari-fis-missing-tiles

@cdeil - Can you please give this a review? We still need to add a few test cases for this. I'm not sure how to construct the test case, but we can probably pass a bogus URL and try to plot the result at the end.

adl1995 commented 5 years ago

@cdeil - As discussed, I have now removed the aiohttp package and added a test case for missing HiPS tiles.

The test case is failing because of an issue with Astropy:

hips/conftest.py:5: in <module>
    from astropy.tests.pytest_plugins import *
E   ModuleNotFoundError: No module named 'astropy.tests.pytest_plugins'
cdeil commented 4 years ago

@adl1995 - Looks like I dropped the ball here and didn't respond or merge this in, sorry!

I've now made you admin for hipspy, so you can also go ahead and merge if no-one responds from now on.

Note that in this case it would be best to first fix the CI test fails in GH master, possibly via #143, and to only then continue with functional changes and PRs like this one, always only merging in ones that have all test passing.

How would you feel about getting rid of both our homegrown multi-threading and aiohttp code, and to use parfive instead? https://docs.sunpy.org/en/stable/whatsnew/1.0.html#improved-file-downloading-capability It's a higher-level interface, so should mean less code on our side and same functionality. I don't know how this would play with progress bars, don't remember if we still have them or what options we expose. But presumably we could do whatever we now also with parfive, just by passing along a few config options. The whole premise of parfive is that we get a synchronous and simple API and just a few lines, and all the async is hidden in the download call. In principle we should structure our API so that it's usable also for parallel download and drawing or saving to file, but we never had that, and I think most user for hipspy will be able to fit all tiles in memory, i.e. it's OK to just not support the "tons of tile / out of core processing" in hipspy.