Closed HironaoMiyatake closed 9 years ago
I have no argument with the idea of speeding things up, but first I want to check: are any of the computations we're doing superfluous for the purpose of current testing? if so, we should remove some of the tests to focus on the ones we need if that speeds things up (and then once it's optimized we can put back all the things we removed)
It may also be a good idea to run a profiler before assuming too much about what's taking the most time.
@rmandelb, that's an interesting point. We could probably turn off most things and just use one of @HironaoMiyatake's scatter-plot tests, which are pretty quick but still probe most of the behavior we want to learn about (converting coordinates, pulling out shapes, catalog masking). Sound reasonable, Hironao?
I thinks so. I could turn on only one of scatter plots by using a command line option.
@TallJimbo , I agree (having once spent ages trying to optimize the wrong piece of some code...!). Melanie and Hironao, are these targets for optimization based on profiling or are they your best common-sense estimates? I'm all for common sense but just in case of surprises, profiling is better.
The first one is a common-sense estimate. The second one we know is a problem, if not how big--when I was writing the Visit
-level tests, there was a huge difference in processing times between code where I had to lookup the schema for each CCD, vs code where it got reused by a list comprehension. But how much it matters relative to other overheads for a single CCD or visit is unclear. Third thing, I don't think it's the problem, but we had discussed it at some point so I thought I'd throw it on there.
Here is the first profiling. It seems that lots of time are spent for table handling... _tableLib.Schema__find_**
is related to the second point?
>>> p.sort_stats('time').print_stats(20)
Thu Sep 18 20:58:01 2014 profile.dat
8300714 function calls (8261136 primitive calls) in 35.687 seconds
Ordered by: internal time
List reduced from 5911 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1303 4.164 0.003 4.164 0.003 {method 'read' of 'file' objects}
17455 2.485 0.000 2.485 0.000 {posix.stat}
24 2.291 0.095 2.291 0.095 {imp.load_module}
33384 1.739 0.000 1.739 0.000 {_tableLib.Schema__find_Angle}
33389 1.716 0.000 1.716 0.000 {_tableLib.Schema__find_Coord}
16733 1.255 0.000 11.288 0.001 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:727(find)
715 0.912 0.001 5.422 0.008 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile)
16738 0.899 0.000 0.899 0.000 {_tableLib.Schema__find_MomentsD}
1035147 0.895 0.000 1.389 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)
16702 0.876 0.000 0.876 0.000 {_tableLib.Schema__find_D}
16702 0.874 0.000 0.874 0.000 {_tableLib.Schema__find_I}
16696 0.863 0.000 0.863 0.000 {_tableLib.Schema__find_String}
985271 0.862 0.000 3.351 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
16692 0.861 0.000 0.861 0.000 {_tableLib.Schema__find_CovF}
16702 0.861 0.000 0.861 0.000 {_tableLib.Schema__find_L}
16696 0.854 0.000 0.854 0.000 {_tableLib.Schema__find_PointD}
1 0.714 0.714 0.714 0.714 {_tableLib.SourceCatalog_readFits}
1023508 0.693 0.000 2.608 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
1025653 0.663 0.000 0.663 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
45 0.651 0.014 0.651 0.014 {method 'readlines' of 'file' objects}
_tableLib.Schema_find\ is related to the second point
Yes, that's exactly it - Schema.find is called whenever you try to go from a string to a Key (which happens implicitly when you access a field using a string). By getting the Keys in advance, it will only be called once, rather than for every record.
Thanks! I'll put a line to get Keys outside of a for loop.
I don't think that will be quite enough to do it--we get shapes in a different function than the rest of the quantities, for example. So I think the Keys should be found in the main function block, then saved as attributes so we can access them from any of the child functions.
Potentially, it would be nice to do these as class attributes rather than instance attributes (see Stack Overflow). That way if we call the CCD version on multiple CCDs, for example, we could reuse the same Key lookup every time. I don't think it's possible to call this wrapper code with multiple reruns, for example, so I don't think this will cause problems. But I don't know how easy that would be to do (and don't actually know if the Key lookup would persist to different catalogs from the same rerun). @TallJimbo, any thoughts?
I'd originally thought the only way to do this well would be to refactor so the Keys could be instance attributes, but I think the class attribute idea would also work, and it seems like it'd be a lot easier. You would have to be careful to make sure you stick to one rerun during an entire Python session, however, because the Keys are subject to change between reruns (though they rarely will in practice, as the configuration settings that affect them are rarely modified).
In terms of when/how to store those Keys, one option would be to simply save them as class attributes the first time you use them (i.e. start by setting them to None). You can also use the "_schema" butler datasets (e..g "src_schema", "deepCoadd_src_schema") which you can get from the Butler without a data ID. These are empty SourceCatalog objects with the same Schemas as the per-data ID catalogs, so you can use them to get Keys before you start processing and data IDs.
I don't think that will be quite enough to do it--we get shapes in a different function than the rest of the quantities, for example. So I think the Keys should be found in the main function block, then saved as attributes so we can access them from any of the child functions.
@TallJimbo told me offline that
moments = [src.get("shape.sdss") for src in data]
becomes much faster if we do
key = data.schema.find("shape.sdss").key
moments = [src.get(key) for src in data]
, because Schema.find is now out of the for loop. Is this correct?
Yes, that's correct. Basically, Hironao's approach will speed things up a lot (go from one Schema.find call per source to one call per CCD/patch), and Melanie's will speed it up a bit more (one Schema.find call per rerun).
For a first trial, I replaced shape.sdss
, shape.sdss.err
, shape.sdss.psf
with Keys (changes pushed). However I found that the number of calls of relevant Schema.find is not really reduced, and I found that the numbers are much larger than the number of stars in this example (41). This result suggests that Schema.find is called more than what is need for processing stars. Any thoughts, @TallJimbo or @msimet ?
Before the change
ncalls tottime percall cumtime percall filename:lineno(function)
16854 0.861 0.000 0.861 0.000 {_tableLib.Schema__find_CovF}
After the change
ncalls tottime percall cumtime percall filename:lineno(function)
16734 0.848 0.000 0.848 0.000 {_tableLib.Schema__find_CovF}
(Comments are updated on the GitHub website)
When you call Schema.find()
, it actually invokes all of the Schema__find_<type>
methods (for all the different possible types), at least until it finds with the right type. That's why there are so many more calls than there are stars. It also seems likely that some other field access other than those for "shape.sdss" are playing a big role in the performance, and because the same underlying methods are called for each, it is hard to tell from the profile which ones are dominating. Maybe that means that Melanie's suggestion would produce a big additional speedup relative to this change, not just a small one? Are there any other areas in the code where we look up fields by name for each record, rather than get a full column?
When you call Schema.find(), it actually invokes all of the Schema_find
methods (for all the different possible types), at least until it finds with the right type.
If this is the case, the number of calls is reduced more than what we have? The difference of the number of calls is 16854 - 16734 = 120, which is ~3 times the number of stars(41). I thought it the difference would be 41 x (the number of iteration of Schema.find() to find the right type) and the latter would be larger than 3.
It looks like "CovF" is actually the third type it tries (the order is set by lsst.afw.table.tableLib._suffixes
), so I think that mostly makes sense.
Are there any other areas in the code where we look up fields by name for each record, rather than get a full column?
Yes. The big ones are probably the flags (6 shape flags and 20 other flags...)
Actually, we could probably guard against separate reruns by having this be a class variable that's a nested dict, accessed like
self.keys_dict[rerun][str]
(which I think would also update the class variable without us having to do something explicit, since it's not a reassignation of the variable name?) I assume we can get the rerun name from the butler, there. And then we don't have to get all the keys at once, we can just do if not str in self.keys_dict[rerun]
to get the key when we need it, and it will be stored for further use.
It might actually be tricky to get the rerun name from the Butler, actually. I'm not sure if that's possible. And you don't want to make getting the Keys too complicated, because after all we want grabbing them and using them to be quick (though I don't think an extra Python dict lookup is likely to be a big deal).
I'd just not worry about the rerun, actually. As long as you're using the HSC pipeline's command-line argument parsing to actually invoke these, it will restrict you to one rerun anyway.
Ah, okay--I thought that was the case, but your comment made me unsure!
We can just add a note to the documentation saying not to call with multiple reruns or results may be nonsensical. I think that's fine.
Yes. The big ones are probably the flags (6 shape flags and 20 other flags...)
It's true for shape flags, but for other flags I guess all of them are used only in removeFlaggedObjects()
where calls these flags only once (it does not use for loops), so it does not make the speed slower.
Oh, that's a good point, sorry--misreading my own code :)
Is there any background key-finding happening in methods like
.getCentroid()
? We do those in loops as well.
Hmm, @TallJimbo, our guru, could answer this!
No, those methods (we call them slot getters) cache their own keys, so they should already be fast.
Even if they are in a for loop?
Correct. Those should actually be faster than using a Key from Python, because the Key is stored and accessed in C++.
Okay, thanks!
Perhaps it's worth trying to get the keys for the shape flags outside the for loops and seeing if that makes a difference. (Here, let me make more work for you, Hironao! :) )
Also: how many times do the methods ._computeShapeMask
, '.computeShapes, and
.computeExtraColumn` get called? Just once, or multiple times? (It
should be once, I would think, but perhaps worth checking.)
Here it is! For example, _tableLib.Schema__find_Angle
is significantly removed because it's at the first one in afwTable.tableLib._suffixes
.
Before
ncalls tottime percall cumtime percall filename:lineno(function)
33468 1.750 0.000 1.750 0.000 {_tableLib.Schema__find_Angle}
16734 1.228 0.000 11.139 0.001 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:727(find)
16744 0.862 0.000 0.862 0.000 {_tableLib.Schema__find_I}
16738 0.861 0.000 0.861 0.000 {_tableLib.Schema__find_F}
16741 0.861 0.000 0.861 0.000 {_tableLib.Schema__find_D}
16739 0.850 0.000 0.850 0.000 {_tableLib.Schema__find_String}
16744 0.850 0.000 0.850 0.000 {_tableLib.Schema__find_L}
16734 0.848 0.000 0.848 0.000 {_tableLib.Schema__find_CovF}
16742 0.845 0.000 0.845 0.000 {_tableLib.Schema__find_Coord}
16738 0.843 0.000 0.843 0.000 {_tableLib.Schema__find_PointD}
16737 0.840 0.000 0.840 0.000 {_tableLib.Schema__find_PointI}
1045751 0.798 0.000 1.248 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)
995860 0.775 0.000 3.030 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
1034110 0.655 0.000 2.365 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
724 0.655 0.001 0.872 0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile)
1036255 0.597 0.000 0.597 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
1 0.419 0.419 0.419 0.419 {_tableLib.SourceCatalog_readFits}
25 0.333 0.013 0.333 0.013 {imp.load_module}
1077602 0.294 0.000 0.294 0.000 {method 'get' of 'dict' objects}
6992 0.262 0.000 3.310 0.000 {filter}
After
ncalls tottime percall cumtime percall filename:lineno(function)
16744 0.900 0.000 0.900 0.000 {_tableLib.Schema__find_Angle}
16747 0.873 0.000 0.873 0.000 {_tableLib.Schema__find_D}
16747 0.870 0.000 0.870 0.000 {_tableLib.Schema__find_I}
16750 0.858 0.000 0.858 0.000 {_tableLib.Schema__find_L}
16740 0.856 0.000 0.856 0.000 {_tableLib.Schema__find_CovF}
16745 0.856 0.000 0.856 0.000 {_tableLib.Schema__find_String}
1045751 0.793 0.000 1.242 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)
995860 0.771 0.000 2.982 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
16740 0.684 0.000 6.210 0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:727(find)
735 0.658 0.001 0.850 0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile)
1034110 0.622 0.000 2.316 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
1036255 0.585 0.000 0.585 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
1 0.428 0.428 0.428 0.428 {_tableLib.SourceCatalog_readFits}
25 0.331 0.013 0.331 0.013 {imp.load_module}
1077602 0.297 0.000 0.297 0.000 {method 'get' of 'dict' objects}
6992 0.257 0.000 3.259 0.000 {filter}
120 0.214 0.002 0.214 0.002 {_policyLib.Policy_mergeDefaults}
1 0.212 0.212 0.255 0.255 {_persistenceLib.Persistence_unsafeRetrieve}
33539/16831 0.181 0.000 6.522 0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:2370(get)
18178 0.125 0.000 0.125 0.000 {posix.stat}
Hmm, neat! How much was the overall time reduced?
from 19.790 seconds to 14.882 seconds.
OK, so...that would probably save us on the order of 15 minutes off an hour's processing time? So, good, but we still need to do some optimization. Yes?
I'm happy to implement the key caching, if you like (though you're welcome to do it too of course). Probably wouldn't make a difference for a single CCD, could be important for a full FOV.
Let me explore replacing strings a bit more. Also I am not sure your estimation is valid for visit-level. There might be a header taking time that is called only once both for ccd-level and visit-level.
So, another thing you could do that might help, if (as it seems) the various Schema._find__<type>
methods are still the bottleneck: you could call those directly, instead of calling Schema.find
. Normally, I wouldn't recommend that, because it makes you more reliant on the exact types we used in the tables, but it would cut down on the number of unnecessary calls. I'd start with the Key caching of the sort Melanie suggested, though, and implement this on top if necessary. Considering that the number of flags you're looking up is only a factor of two away from the number of stars you're looking at per CCD, only doing those lookups once per rerun is probably more important than I originally thought.
Hironao, note that if you replace string lookup with a find call and key lookup when the string lookup wasn't in a for loop, it won't make any difference at all (unless you use one of the private type-specific find methods, as I brought up in the last paragraph).
Yes, I'd change those in a for loop!
Okay, so now all the Schema.find
has gone from the top 20! Next, let me look into visit-level...
>>> p7.sort_stats("time").print_stats(20)
Fri Sep 19 14:59:44 2014 profile7.dat
7395096 function calls (7364340 primitive calls) in 8.202 seconds
Ordered by: internal time
List reduced from 6004 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1045751 0.813 0.000 1.257 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)
995860 0.772 0.000 2.998 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
757 0.752 0.001 0.917 0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile)
1034110 0.625 0.000 2.332 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
1036255 0.583 0.000 0.583 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
1 0.380 0.380 0.380 0.380 {_tableLib.SourceCatalog_readFits}
25 0.368 0.015 0.368 0.015 {imp.load_module}
1077602 0.292 0.000 0.292 0.000 {method 'get' of 'dict' objects}
6992 0.245 0.000 3.263 0.000 {filter}
120 0.216 0.002 0.216 0.002 {_policyLib.Policy_mergeDefaults}
18376 0.120 0.000 0.120 0.000 {posix.stat}
1 0.095 0.095 0.140 0.140 {_persistenceLib.Persistence_unsafeRetrieve}
13 0.086 0.007 0.233 0.018 /tigress/HSC/products-20130212/Linux64/matplotlib/1.3.1/lib/python/site-packages/pyparsing-2.0.1-py2.7.egg/pyparsing.py:3145(oneOf)
2 0.080 0.040 0.080 0.040 {built-in method write_png}
714 0.077 0.000 0.077 0.000 {method 'write' of 'file' objects}
7 0.077 0.011 0.077 0.011 {_policyLib.Policy_createPolicy}
8259/5434 0.066 0.000 0.258 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/VersionCompare.py:16(stdCompare)
112 0.060 0.001 0.315 0.003 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/cameraGeom/utils.py:197(makeCcd)
1573 0.053 0.000 0.053 0.000 {open}
1 0.052 0.052 0.162 0.162 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/__init__.py:1(<module>)
Oooh, nice! (And hello there, matplotlib functions...how nice to see you for the single scatter plot we're making, heh)
On Fri, Sep 19, 2014 at 3:01 PM, Hironao Miyatake notifications@github.com wrote:
Okay, so now all the Schema.find has gone from the top 20! Next, let me look into visit-level...
p7.sort_stats("time").print_stats(20) Fri Sep 19 14:59:44 2014 profile7.dat
7395096 function calls (7364340 primitive calls) in 8.202 seconds
Ordered by: internal time List reduced from 6004 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function) 1045751 0.813 0.000 1.257 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile) 995860 0.772 0.000 2.998 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(
) 757 0.752 0.001 0.917 0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile) 1034110 0.625 0.000 2.332 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search) 1036255 0.583 0.000 0.583 0.000 {method 'search' of '_sre.SRE_Pattern' objects} 1 0.380 0.380 0.380 0.380 {_tableLib.SourceCatalog_readFits} 25 0.368 0.015 0.368 0.015 {imp.load_module} 1077602 0.292 0.000 0.292 0.000 {method 'get' of 'dict' objects} 6992 0.245 0.000 3.263 0.000 {filter} 120 0.216 0.002 0.216 0.002 {_policyLib.Policy_mergeDefaults} 18376 0.120 0.000 0.120 0.000 {posix.stat} 1 0.095 0.095 0.140 0.140 {_persistenceLib.Persistence_unsafeRetrieve} 13 0.086 0.007 0.233 0.018 /tigress/HSC/products-20130212/Linux64/matplotlib/1.3.1/lib/python/site-packages/pyparsing-2.0.1-py2.7.egg/pyparsing.py:3145(oneOf) 2 0.080 0.040 0.080 0.040 {built-in method write_png} 714 0.077 0.000 0.077 0.000 {method 'write' of 'file' objects} 7 0.077 0.011 0.077 0.011 {_policyLib.Policy_createPolicy} 8259/5434 0.066 0.000 0.258 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/VersionCompare.py:16(stdCompare) 112 0.060 0.001 0.315 0.003 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/cameraGeom/utils.py:197(makeCcd) 1573 0.053 0.000 0.053 0.000 {open} 1 0.052 0.052 0.162 0.162 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/init.py:1( ) — Reply to this email directly or view it on GitHub https://github.com/msimet/Stile/issues/39#issuecomment-56220271.
Yeah, now I'm wondering where all those regular expressions are. Maybe in the Butler? What's the total time now?
For example, it seems that eups calls a lot for the first one (re.py).
>>> p7.print_callers('tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226')
Ordered by: internal time
List reduced from 6004 to 1 due to restriction <'tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226'>
Function was called by...
ncalls tottime cumtime
/tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile) <- 124 0.000 0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:134(match)
1034110 0.802 1.128 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
26 0.000 0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:144(sub)
10868 0.008 0.011 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:164(split)
118 0.000 0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:169(findall)
505 0.002 0.116 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:188(compile)
<pstats.Stats instance at 0x1b30fc8>
>>> p7.print_callers('/tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139')
Ordered by: internal time
List reduced from 6004 to 1 due to restriction <'/tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139'>
Function was called by...
ncalls tottime cumtime
/tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search) <- 2 0.000 0.000 /tigress/HSC/products-20130212/Linux64/daf_butlerUtils/HSC-3.0.0/python/lsst/daf/butlerUtils/cameraMapper.py:714(_standardizeExposure)
81 0.000 0.000 /tigress/HSC/products-20130212/Linux64/daf_persistence/HSC-3.0.0/python/lsst/daf/persistence/eupsVersions.py:17(__init__)
2 0.000 0.000 /tigress/HSC/products-20130212/Linux64/pipe_base/HSC-3.0.0/python/lsst/pipe/base/argumentParser.py:637(__call__)
1 0.000 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/Eups.py:105(__init__)
10 0.000 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/Eups.py:666(_kindlySetPreferredTags)
16 0.000 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/Eups.py:1421(findPreferredProduct)
580 0.000 0.001 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/Eups.py:1501(<lambda>)
11482 0.007 0.028 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/VersionCompare.py:16(stdCompare)
26034 0.018 0.074 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/VersionCompare.py:115(_splitVersion)
41 0.000 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/tags.py:129(registerTag)
995860 0.599 2.226 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
1 0.000 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:214(determineFlavor)
Total time is 8.2 sec.
Nice overall speedup! And it looks like those Eups regex calls are indeed in the Butler, and in particular in the checks for whether the versions of the pipeline are the same as what's being run. So they should go down quite a bit as a fraction of the total when we run more than one CCD at a time.
Thanks, I am happy to hear that it's promising!
Melanie, I found the following error. I guess we did not add _chip
and _sky
things on the visit level. Could you fix this?
2014-09-19T19:45:38: CameraMapper WARNING: No registry loaded; proceeding without one
Traceback (most recent call last):
File "/tigress/HSC/users/miyatake/Stile/bin/StileVisitNoTract.py", line 5, in <module>
VisitNoTractSingleEpochStileTask.parseAndRun()
File "/tigress/HSC/products-20130212/Linux64/pipe_base/HSC-3.0.0/python/lsst/pipe/base/cmdLineTask.py", line 353, in parseAndRun
resultList = taskRunner.run(parsedCmd)
File "/tigress/HSC/products-20130212/Linux64/pipe_base/HSC-3.0.0/python/lsst/pipe/base/cmdLineTask.py", line 157, in run
resultList = mapFunc(self, self.getTargetList(parsedCmd))
File "/tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py", line 649, in __call__
result = task.run(*args)
File "/tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py", line 786, in run
results = sys_test(self.config, *new_catalogs)
File "/tigress/HSC/users/miyatake/Stile/stile/hsc/sys_test_adapters.py", line 310, in __call__
return self.sys_test(*new_data, per_ccd_stat = per_ccd_stat)
File "/tigress/HSC/users/miyatake/Stile/stile/sys_tests.py", line 1015, in __call__
psf_g1, g1, g1_err = self.getStatisticsPerCCD(array['CCD'], array['psf_g1'],
ValueError: field named psf_g1 not found
Hmm, that was clever of me. I'll go fix this...sorry for the delay!
Just pushed a thing to branch #39...was that enough to make it work?
It's getting closer, but here is another bug... Could you make this change on master as well so that I can compare the computing time between these two branches?
2014-09-19T20:02:01: CameraMapper WARNING: No registry loaded; proceeding without one
Traceback (most recent call last):
File "/tigress/HSC/users/miyatake/Stile/bin/StileVisitNoTract.py", line 5, in <module>
VisitNoTractSingleEpochStileTask.parseAndRun()
File "/tigress/HSC/products-20130212/Linux64/pipe_base/HSC-3.0.0/python/lsst/pipe/base/cmdLineTask.py", line 353, in parseAndRun
resultList = taskRunner.run(parsedCmd)
File "/tigress/HSC/products-20130212/Linux64/pipe_base/HSC-3.0.0/python/lsst/pipe/base/cmdLineTask.py", line 157, in run
resultList = mapFunc(self, self.getTargetList(parsedCmd))
File "/tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py", line 660, in __call__
result = task.run(*args)
File "/tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py", line 781, in run
cols.append('_'.join(c.split('_')[:-1]))
AttributeError: 'tuple' object has no attribute 'append'
OK, this appears to be done (here and on master)...
BTW, I found the branch where I'd tried a couple of things, and the guess for bullet point 1 was to add .copy(deep=True)
to the masked catalog passed to computeExtraColumn
. This was the thing that failed at .getCentroid()
as I think I mentioned to you in email. But this current line of inquiry looks like it's saving a lot of time, so that may not be necessary!
Thank you for the fix! Okay, for the entire field, it's 221 sec without the optimization and 70.5 sec with the optimization! I attach the result of profiling. Half of the time is occupied by data loading. The Schema.find()
things come up again, but even after the fix suggested by Melanie, that will save only a few seconds. Any room for speed up?
>>> pv39.sort_stats("time").print_stats(50)
Fri Sep 19 17:36:35 2014 profile_visit_39.dat
9613902 function calls (9573794 primitive calls) in 70.513 seconds
Ordered by: internal time
List reduced from 6154 to 50 due to restriction <50>
ncalls tottime percall cumtime percall filename:lineno(function)
100 38.233 0.382 38.233 0.382 {_tableLib.SourceCatalog_readFits}
100 17.958 0.180 18.198 0.182 {_persistenceLib.Persistence_unsafeRetrieve}
1046028 0.819 0.000 1.279 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)
995860 0.772 0.000 3.022 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
1 0.769 0.769 62.910 62.910 /tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py:704(run)
784 0.755 0.001 0.961 0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile)
1034307 0.634 0.000 2.359 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
1 0.609 0.609 63.520 63.520 /tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py:658(__call__)
1036468 0.589 0.000 0.589 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
25 0.493 0.020 0.493 0.020 {imp.load_module}
100 0.438 0.004 19.474 0.195 /tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py:364(computeShapes)
100 0.348 0.003 1.303 0.013 /tigress/HSC/users/miyatake/Stile/stile/hsc/sys_test_adapters.py:53(MaskPSFStar)
5700 0.305 0.000 0.305 0.000 {_tableLib.Schema__find_Coord}
1084167 0.300 0.000 0.300 0.000 {method 'get' of 'dict' objects}
5500 0.276 0.000 0.276 0.000 {_tableLib.Schema__find_I}
5300 0.272 0.000 0.272 0.000 {_tableLib.Schema__find_Angle}
6992 0.252 0.000 3.291 0.000 {filter}
2800 0.251 0.000 2.216 0.001 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:727(find)
120 0.220 0.002 0.220 0.002 {_policyLib.Policy_mergeDefaults}
215217 0.199 0.000 1.865 0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:8139(__getitem__)
534266 0.192 0.000 0.193 0.000 {isinstance}
208326 0.186 0.000 0.186 0.000 {_tableLib.BaseRecord_get}
3300 0.166 0.000 0.166 0.000 {_tableLib.Schema__find_L}
215017 0.164 0.000 0.164 0.000 {_tableLib._SourceCatalogBase___getitem__}
3200 0.164 0.000 0.164 0.000 {_tableLib.Schema__find_D}
208326 0.158 0.000 0.426 0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:2370(get)
20921 0.158 0.000 0.158 0.000 {posix.stat}
3200 0.155 0.000 0.155 0.000 {_tableLib.Schema__find_MomentsD}
2900 0.146 0.000 0.146 0.000 {_tableLib.Schema__find_String}
213517 0.141 0.000 0.561 0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:8221(__iter__)
3252 0.137 0.000 0.137 0.000 {numpy.core.multiarray.zeros}
2500 0.127 0.000 0.127 0.000 {_tableLib.Schema__find_CovF}
741 0.097 0.000 0.097 0.000 {method 'write' of 'file' objects}
13 0.086 0.007 0.237 0.018 /tigress/HSC/products-20130212/Linux64/matplotlib/1.3.1/lib/python/site-packages/pyparsing-2.0.1-py2.7.egg/pyparsing.py:3145(oneOf)
7 0.078 0.011 0.078 0.011 {_policyLib.Policy_createPolicy}
1864 0.073 0.000 0.073 0.000 {method 'reduce' of 'numpy.ufunc' objects}
1642 0.068 0.000 0.068 0.000 {open}
8259/5434 0.066 0.000 0.261 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/VersionCompare.py:16(stdCompare)
112 0.061 0.001 0.324 0.003 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/cameraGeom/utils.py:197(makeCcd)
301 0.053 0.000 0.053 0.000 {method 'execute' of 'sqlite3.Connection' objects}
1 0.052 0.052 0.164 0.164 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/__init__.py:1(<module>)
200 0.051 0.000 0.051 0.000 {_tableLib.SourceCatalog_subset}
184143 0.051 0.000 0.081 0.000 /tigress/HSC/products-20130212/Linux64/matplotlib/1.3.1/lib/python/site-packages/pyparsing-2.0.1-py2.7.egg/pyparsing.py:3163(<lambda>)
73456 0.049 0.000 0.089 0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/geom/geomLib.py:3117(__getitem__)
35016 0.049 0.000 0.126 0.000 /tigress/HSC/products-20130212/Linux64/pex_policy/HSC-3.0.0/python/lsst/pex/policy/policyLib.py:2330(_Policy_get)
100 0.047 0.000 20.302 0.203 /tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py:236(generateColumns)
232603 0.046 0.000 0.046 0.000 {method 'startswith' of 'str' objects}
800 0.042 0.000 0.042 0.000 {_tableLib.Schema__find_PointD}
800 0.042 0.000 0.042 0.000 {_tableLib.Schema__find_PointI}
4219 0.042 0.000 0.042 0.000 {numpy.core.multiarray.array}
We have to speed up the HSC interface, since it takes ~1 hour to process the entire FOV! @msimet summarized what we could do as follows, roughly in order of priority.
I'll start looking into 1.