msimet / Stile

Stile: the Systematics Tests In Lensing pipeline
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

Speeding up HSC interface #39

Closed HironaoMiyatake closed 9 years ago

HironaoMiyatake commented 9 years ago

We have to speed up the HSC interface, since it takes ~1 hour to process the entire FOV! @msimet summarized what we could do as follows, roughly in order of priority.

  1. Make sure we can use C++ loops on the catalogs, not Python.(snip) My guess is that this will solve half of our problems at least.
  2. Figuring out the catalog schema ahead of time and getting Key objects, rather than using a string every time we need to access a new column. Jim had said he could show us how to do this, but it might involve a big code refactor. (snip)
  3. I don't think the masking to NaNs is a big time sink, but if it is, we could try just computing every quantity for every entry in the source catalog. I doubt that would be faster, but it's possible.

I'll start looking into 1.

rmandelb commented 9 years ago

I have no argument with the idea of speeding things up, but first I want to check: are any of the computations we're doing superfluous for the purpose of current testing? if so, we should remove some of the tests to focus on the ones we need if that speeds things up (and then once it's optimized we can put back all the things we removed)

TallJimbo commented 9 years ago

It may also be a good idea to run a profiler before assuming too much about what's taking the most time.

msimet commented 9 years ago

@rmandelb, that's an interesting point. We could probably turn off most things and just use one of @HironaoMiyatake's scatter-plot tests, which are pretty quick but still probe most of the behavior we want to learn about (converting coordinates, pulling out shapes, catalog masking). Sound reasonable, Hironao?

HironaoMiyatake commented 9 years ago

I thinks so. I could turn on only one of scatter plots by using a command line option.

rmandelb commented 9 years ago

@TallJimbo , I agree (having once spent ages trying to optimize the wrong piece of some code...!). Melanie and Hironao, are these targets for optimization based on profiling or are they your best common-sense estimates? I'm all for common sense but just in case of surprises, profiling is better.

msimet commented 9 years ago

The first one is a common-sense estimate. The second one we know is a problem, if not how big--when I was writing the Visit-level tests, there was a huge difference in processing times between code where I had to lookup the schema for each CCD, vs code where it got reused by a list comprehension. But how much it matters relative to other overheads for a single CCD or visit is unclear. Third thing, I don't think it's the problem, but we had discussed it at some point so I thought I'd throw it on there.

HironaoMiyatake commented 9 years ago

Here is the first profiling. It seems that lots of time are spent for table handling... _tableLib.Schema__find_** is related to the second point?

>>> p.sort_stats('time').print_stats(20)
Thu Sep 18 20:58:01 2014    profile.dat

         8300714 function calls (8261136 primitive calls) in 35.687 seconds

   Ordered by: internal time
   List reduced from 5911 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1303    4.164    0.003    4.164    0.003 {method 'read' of 'file' objects}
    17455    2.485    0.000    2.485    0.000 {posix.stat}
       24    2.291    0.095    2.291    0.095 {imp.load_module}
    33384    1.739    0.000    1.739    0.000 {_tableLib.Schema__find_Angle}
    33389    1.716    0.000    1.716    0.000 {_tableLib.Schema__find_Coord}
    16733    1.255    0.000   11.288    0.001 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:727(find)
      715    0.912    0.001    5.422    0.008 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile)
    16738    0.899    0.000    0.899    0.000 {_tableLib.Schema__find_MomentsD}
  1035147    0.895    0.000    1.389    0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)
    16702    0.876    0.000    0.876    0.000 {_tableLib.Schema__find_D}
    16702    0.874    0.000    0.874    0.000 {_tableLib.Schema__find_I}
    16696    0.863    0.000    0.863    0.000 {_tableLib.Schema__find_String}
   985271    0.862    0.000    3.351    0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
    16692    0.861    0.000    0.861    0.000 {_tableLib.Schema__find_CovF}
    16702    0.861    0.000    0.861    0.000 {_tableLib.Schema__find_L}
    16696    0.854    0.000    0.854    0.000 {_tableLib.Schema__find_PointD}
        1    0.714    0.714    0.714    0.714 {_tableLib.SourceCatalog_readFits}
  1023508    0.693    0.000    2.608    0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
  1025653    0.663    0.000    0.663    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
       45    0.651    0.014    0.651    0.014 {method 'readlines' of 'file' objects}
TallJimbo commented 9 years ago

_tableLib.Schema_find\ is related to the second point

Yes, that's exactly it - Schema.find is called whenever you try to go from a string to a Key (which happens implicitly when you access a field using a string). By getting the Keys in advance, it will only be called once, rather than for every record.

HironaoMiyatake commented 9 years ago

Thanks! I'll put a line to get Keys outside of a for loop.

msimet commented 9 years ago

I don't think that will be quite enough to do it--we get shapes in a different function than the rest of the quantities, for example. So I think the Keys should be found in the main function block, then saved as attributes so we can access them from any of the child functions.

Potentially, it would be nice to do these as class attributes rather than instance attributes (see Stack Overflow). That way if we call the CCD version on multiple CCDs, for example, we could reuse the same Key lookup every time. I don't think it's possible to call this wrapper code with multiple reruns, for example, so I don't think this will cause problems. But I don't know how easy that would be to do (and don't actually know if the Key lookup would persist to different catalogs from the same rerun). @TallJimbo, any thoughts?

TallJimbo commented 9 years ago

I'd originally thought the only way to do this well would be to refactor so the Keys could be instance attributes, but I think the class attribute idea would also work, and it seems like it'd be a lot easier. You would have to be careful to make sure you stick to one rerun during an entire Python session, however, because the Keys are subject to change between reruns (though they rarely will in practice, as the configuration settings that affect them are rarely modified).

In terms of when/how to store those Keys, one option would be to simply save them as class attributes the first time you use them (i.e. start by setting them to None). You can also use the "_schema" butler datasets (e..g "src_schema", "deepCoadd_src_schema") which you can get from the Butler without a data ID. These are empty SourceCatalog objects with the same Schemas as the per-data ID catalogs, so you can use them to get Keys before you start processing and data IDs.

HironaoMiyatake commented 9 years ago

I don't think that will be quite enough to do it--we get shapes in a different function than the rest of the quantities, for example. So I think the Keys should be found in the main function block, then saved as attributes so we can access them from any of the child functions.

@TallJimbo told me offline that

moments = [src.get("shape.sdss") for src in data]

becomes much faster if we do

key = data.schema.find("shape.sdss").key
moments = [src.get(key) for src in data]

, because Schema.find is now out of the for loop. Is this correct?

TallJimbo commented 9 years ago

Yes, that's correct. Basically, Hironao's approach will speed things up a lot (go from one Schema.find call per source to one call per CCD/patch), and Melanie's will speed it up a bit more (one Schema.find call per rerun).

HironaoMiyatake commented 9 years ago

For a first trial, I replaced shape.sdss, shape.sdss.err, shape.sdss.psf with Keys (changes pushed). However I found that the number of calls of relevant Schema.find is not really reduced, and I found that the numbers are much larger than the number of stars in this example (41). This result suggests that Schema.find is called more than what is need for processing stars. Any thoughts, @TallJimbo or @msimet ?

Before the change

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
16854    0.861    0.000    0.861    0.000 {_tableLib.Schema__find_CovF}

After the change

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
16734    0.848    0.000    0.848    0.000 {_tableLib.Schema__find_CovF}
HironaoMiyatake commented 9 years ago

(Comments are updated on the GitHub website)

TallJimbo commented 9 years ago

When you call Schema.find(), it actually invokes all of the Schema__find_<type> methods (for all the different possible types), at least until it finds with the right type. That's why there are so many more calls than there are stars. It also seems likely that some other field access other than those for "shape.sdss" are playing a big role in the performance, and because the same underlying methods are called for each, it is hard to tell from the profile which ones are dominating. Maybe that means that Melanie's suggestion would produce a big additional speedup relative to this change, not just a small one? Are there any other areas in the code where we look up fields by name for each record, rather than get a full column?

HironaoMiyatake commented 9 years ago

When you call Schema.find(), it actually invokes all of the Schema_find methods (for all the different possible types), at least until it finds with the right type.

If this is the case, the number of calls is reduced more than what we have? The difference of the number of calls is 16854 - 16734 = 120, which is ~3 times the number of stars(41). I thought it the difference would be 41 x (the number of iteration of Schema.find() to find the right type) and the latter would be larger than 3.

TallJimbo commented 9 years ago

It looks like "CovF" is actually the third type it tries (the order is set by lsst.afw.table.tableLib._suffixes), so I think that mostly makes sense.

msimet commented 9 years ago

Are there any other areas in the code where we look up fields by name for each record, rather than get a full column?

Yes. The big ones are probably the flags (6 shape flags and 20 other flags...)

msimet commented 9 years ago

Actually, we could probably guard against separate reruns by having this be a class variable that's a nested dict, accessed like self.keys_dict[rerun][str] (which I think would also update the class variable without us having to do something explicit, since it's not a reassignation of the variable name?) I assume we can get the rerun name from the butler, there. And then we don't have to get all the keys at once, we can just do if not str in self.keys_dict[rerun] to get the key when we need it, and it will be stored for further use.

TallJimbo commented 9 years ago

It might actually be tricky to get the rerun name from the Butler, actually. I'm not sure if that's possible. And you don't want to make getting the Keys too complicated, because after all we want grabbing them and using them to be quick (though I don't think an extra Python dict lookup is likely to be a big deal).

I'd just not worry about the rerun, actually. As long as you're using the HSC pipeline's command-line argument parsing to actually invoke these, it will restrict you to one rerun anyway.

msimet commented 9 years ago

Ah, okay--I thought that was the case, but your comment made me unsure!

We can just add a note to the documentation saying not to call with multiple reruns or results may be nonsensical. I think that's fine.

HironaoMiyatake commented 9 years ago

Yes. The big ones are probably the flags (6 shape flags and 20 other flags...)

It's true for shape flags, but for other flags I guess all of them are used only in removeFlaggedObjects() where calls these flags only once (it does not use for loops), so it does not make the speed slower.

msimet commented 9 years ago

Oh, that's a good point, sorry--misreading my own code :)

msimet commented 9 years ago

Is there any background key-finding happening in methods like .getCentroid()? We do those in loops as well.

HironaoMiyatake commented 9 years ago

Hmm, @TallJimbo, our guru, could answer this!

TallJimbo commented 9 years ago

No, those methods (we call them slot getters) cache their own keys, so they should already be fast.

HironaoMiyatake commented 9 years ago

Even if they are in a for loop?

TallJimbo commented 9 years ago

Correct. Those should actually be faster than using a Key from Python, because the Key is stored and accessed in C++.

HironaoMiyatake commented 9 years ago

Okay, thanks!

msimet commented 9 years ago

Perhaps it's worth trying to get the keys for the shape flags outside the for loops and seeing if that makes a difference. (Here, let me make more work for you, Hironao! :) )

Also: how many times do the methods ._computeShapeMask, '.computeShapes, and.computeExtraColumn` get called? Just once, or multiple times? (It should be once, I would think, but perhaps worth checking.)

HironaoMiyatake commented 9 years ago

Here it is! For example, _tableLib.Schema__find_Angle is significantly removed because it's at the first one in afwTable.tableLib._suffixes.

Before

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    33468    1.750    0.000    1.750    0.000 {_tableLib.Schema__find_Angle}
    16734    1.228    0.000   11.139    0.001 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:727(find)
    16744    0.862    0.000    0.862    0.000 {_tableLib.Schema__find_I}
    16738    0.861    0.000    0.861    0.000 {_tableLib.Schema__find_F}
    16741    0.861    0.000    0.861    0.000 {_tableLib.Schema__find_D}
    16739    0.850    0.000    0.850    0.000 {_tableLib.Schema__find_String}
    16744    0.850    0.000    0.850    0.000 {_tableLib.Schema__find_L}
    16734    0.848    0.000    0.848    0.000 {_tableLib.Schema__find_CovF}
    16742    0.845    0.000    0.845    0.000 {_tableLib.Schema__find_Coord}
    16738    0.843    0.000    0.843    0.000 {_tableLib.Schema__find_PointD}
    16737    0.840    0.000    0.840    0.000 {_tableLib.Schema__find_PointI}
  1045751    0.798    0.000    1.248    0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)
   995860    0.775    0.000    3.030    0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
  1034110    0.655    0.000    2.365    0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
      724    0.655    0.001    0.872    0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile)
  1036255    0.597    0.000    0.597    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
        1    0.419    0.419    0.419    0.419 {_tableLib.SourceCatalog_readFits}
       25    0.333    0.013    0.333    0.013 {imp.load_module}
  1077602    0.294    0.000    0.294    0.000 {method 'get' of 'dict' objects}
     6992    0.262    0.000    3.310    0.000 {filter}

After

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    16744    0.900    0.000    0.900    0.000 {_tableLib.Schema__find_Angle}
    16747    0.873    0.000    0.873    0.000 {_tableLib.Schema__find_D}
    16747    0.870    0.000    0.870    0.000 {_tableLib.Schema__find_I}
    16750    0.858    0.000    0.858    0.000 {_tableLib.Schema__find_L}
    16740    0.856    0.000    0.856    0.000 {_tableLib.Schema__find_CovF}
    16745    0.856    0.000    0.856    0.000 {_tableLib.Schema__find_String}
  1045751    0.793    0.000    1.242    0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)
   995860    0.771    0.000    2.982    0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
    16740    0.684    0.000    6.210    0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:727(find)
      735    0.658    0.001    0.850    0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile)
  1034110    0.622    0.000    2.316    0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
  1036255    0.585    0.000    0.585    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
        1    0.428    0.428    0.428    0.428 {_tableLib.SourceCatalog_readFits}
       25    0.331    0.013    0.331    0.013 {imp.load_module}
  1077602    0.297    0.000    0.297    0.000 {method 'get' of 'dict' objects}
     6992    0.257    0.000    3.259    0.000 {filter}
      120    0.214    0.002    0.214    0.002 {_policyLib.Policy_mergeDefaults}
        1    0.212    0.212    0.255    0.255 {_persistenceLib.Persistence_unsafeRetrieve}
33539/16831    0.181    0.000    6.522    0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:2370(get)
    18178    0.125    0.000    0.125    0.000 {posix.stat}
msimet commented 9 years ago

Hmm, neat! How much was the overall time reduced?

HironaoMiyatake commented 9 years ago

from 19.790 seconds to 14.882 seconds.

msimet commented 9 years ago

OK, so...that would probably save us on the order of 15 minutes off an hour's processing time? So, good, but we still need to do some optimization. Yes?

I'm happy to implement the key caching, if you like (though you're welcome to do it too of course). Probably wouldn't make a difference for a single CCD, could be important for a full FOV.

HironaoMiyatake commented 9 years ago

Let me explore replacing strings a bit more. Also I am not sure your estimation is valid for visit-level. There might be a header taking time that is called only once both for ccd-level and visit-level.

TallJimbo commented 9 years ago

So, another thing you could do that might help, if (as it seems) the various Schema._find__<type> methods are still the bottleneck: you could call those directly, instead of calling Schema.find. Normally, I wouldn't recommend that, because it makes you more reliant on the exact types we used in the tables, but it would cut down on the number of unnecessary calls. I'd start with the Key caching of the sort Melanie suggested, though, and implement this on top if necessary. Considering that the number of flags you're looking up is only a factor of two away from the number of stars you're looking at per CCD, only doing those lookups once per rerun is probably more important than I originally thought.

Hironao, note that if you replace string lookup with a find call and key lookup when the string lookup wasn't in a for loop, it won't make any difference at all (unless you use one of the private type-specific find methods, as I brought up in the last paragraph).

HironaoMiyatake commented 9 years ago

Yes, I'd change those in a for loop!

HironaoMiyatake commented 9 years ago

Okay, so now all the Schema.find has gone from the top 20! Next, let me look into visit-level...

>>> p7.sort_stats("time").print_stats(20)
Fri Sep 19 14:59:44 2014    profile7.dat

         7395096 function calls (7364340 primitive calls) in 8.202 seconds

   Ordered by: internal time
   List reduced from 6004 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1045751    0.813    0.000    1.257    0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)
   995860    0.772    0.000    2.998    0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
      757    0.752    0.001    0.917    0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile)
  1034110    0.625    0.000    2.332    0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
  1036255    0.583    0.000    0.583    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
        1    0.380    0.380    0.380    0.380 {_tableLib.SourceCatalog_readFits}
       25    0.368    0.015    0.368    0.015 {imp.load_module}
  1077602    0.292    0.000    0.292    0.000 {method 'get' of 'dict' objects}
     6992    0.245    0.000    3.263    0.000 {filter}
      120    0.216    0.002    0.216    0.002 {_policyLib.Policy_mergeDefaults}
    18376    0.120    0.000    0.120    0.000 {posix.stat}
        1    0.095    0.095    0.140    0.140 {_persistenceLib.Persistence_unsafeRetrieve}
       13    0.086    0.007    0.233    0.018 /tigress/HSC/products-20130212/Linux64/matplotlib/1.3.1/lib/python/site-packages/pyparsing-2.0.1-py2.7.egg/pyparsing.py:3145(oneOf)
        2    0.080    0.040    0.080    0.040 {built-in method write_png}
      714    0.077    0.000    0.077    0.000 {method 'write' of 'file' objects}
        7    0.077    0.011    0.077    0.011 {_policyLib.Policy_createPolicy}
8259/5434    0.066    0.000    0.258    0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/VersionCompare.py:16(stdCompare)
      112    0.060    0.001    0.315    0.003 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/cameraGeom/utils.py:197(makeCcd)
     1573    0.053    0.000    0.053    0.000 {open}
        1    0.052    0.052    0.162    0.162 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/__init__.py:1(<module>)
msimet commented 9 years ago

Oooh, nice! (And hello there, matplotlib functions...how nice to see you for the single scatter plot we're making, heh)

On Fri, Sep 19, 2014 at 3:01 PM, Hironao Miyatake notifications@github.com wrote:

Okay, so now all the Schema.find has gone from the top 20! Next, let me look into visit-level...

p7.sort_stats("time").print_stats(20) Fri Sep 19 14:59:44 2014 profile7.dat

     7395096 function calls (7364340 primitive calls) in 8.202 seconds

Ordered by: internal time List reduced from 6004 to 20 due to restriction <20>

ncalls tottime percall cumtime percall filename:lineno(function) 1045751 0.813 0.000 1.257 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile) 995860 0.772 0.000 2.998 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127() 757 0.752 0.001 0.917 0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile) 1034110 0.625 0.000 2.332 0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search) 1036255 0.583 0.000 0.583 0.000 {method 'search' of '_sre.SRE_Pattern' objects} 1 0.380 0.380 0.380 0.380 {_tableLib.SourceCatalog_readFits} 25 0.368 0.015 0.368 0.015 {imp.load_module} 1077602 0.292 0.000 0.292 0.000 {method 'get' of 'dict' objects} 6992 0.245 0.000 3.263 0.000 {filter} 120 0.216 0.002 0.216 0.002 {_policyLib.Policy_mergeDefaults} 18376 0.120 0.000 0.120 0.000 {posix.stat} 1 0.095 0.095 0.140 0.140 {_persistenceLib.Persistence_unsafeRetrieve} 13 0.086 0.007 0.233 0.018 /tigress/HSC/products-20130212/Linux64/matplotlib/1.3.1/lib/python/site-packages/pyparsing-2.0.1-py2.7.egg/pyparsing.py:3145(oneOf) 2 0.080 0.040 0.080 0.040 {built-in method write_png} 714 0.077 0.000 0.077 0.000 {method 'write' of 'file' objects} 7 0.077 0.011 0.077 0.011 {_policyLib.Policy_createPolicy} 8259/5434 0.066 0.000 0.258 0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/VersionCompare.py:16(stdCompare) 112 0.060 0.001 0.315 0.003 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/cameraGeom/utils.py:197(makeCcd) 1573 0.053 0.000 0.053 0.000 {open} 1 0.052 0.052 0.162 0.162 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/init.py:1()

— Reply to this email directly or view it on GitHub https://github.com/msimet/Stile/issues/39#issuecomment-56220271.

TallJimbo commented 9 years ago

Yeah, now I'm wondering where all those regular expressions are. Maybe in the Butler? What's the total time now?

HironaoMiyatake commented 9 years ago

For example, it seems that eups calls a lot for the first one (re.py).

>>> p7.print_callers('tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226')
   Ordered by: internal time
   List reduced from 6004 to 1 due to restriction <'tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226'>

Function                                                                               was called by...
                                                                                           ncalls  tottime  cumtime
/tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)  <-     124    0.000    0.001  /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:134(match)
                                                                                          1034110    0.802    1.128  /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
                                                                                               26    0.000    0.001  /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:144(sub)
                                                                                            10868    0.008    0.011  /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:164(split)
                                                                                              118    0.000    0.001  /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:169(findall)
                                                                                              505    0.002    0.116  /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:188(compile)

<pstats.Stats instance at 0x1b30fc8>
>>> p7.print_callers('/tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139')
   Ordered by: internal time
   List reduced from 6004 to 1 due to restriction <'/tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139'>

Function                                                                             was called by...
                                                                                         ncalls  tottime  cumtime
/tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)  <-       2    0.000    0.000  /tigress/HSC/products-20130212/Linux64/daf_butlerUtils/HSC-3.0.0/python/lsst/daf/butlerUtils/cameraMapper.py:714(_standardizeExposure)
                                                                                             81    0.000    0.000  /tigress/HSC/products-20130212/Linux64/daf_persistence/HSC-3.0.0/python/lsst/daf/persistence/eupsVersions.py:17(__init__)
                                                                                              2    0.000    0.000  /tigress/HSC/products-20130212/Linux64/pipe_base/HSC-3.0.0/python/lsst/pipe/base/argumentParser.py:637(__call__)
                                                                                              1    0.000    0.000  /tigress/HSC/products-20130212/eups/1.4.0/python/eups/Eups.py:105(__init__)
                                                                                             10    0.000    0.000  /tigress/HSC/products-20130212/eups/1.4.0/python/eups/Eups.py:666(_kindlySetPreferredTags)
                                                                                             16    0.000    0.000  /tigress/HSC/products-20130212/eups/1.4.0/python/eups/Eups.py:1421(findPreferredProduct)
                                                                                            580    0.000    0.001  /tigress/HSC/products-20130212/eups/1.4.0/python/eups/Eups.py:1501(<lambda>)
                                                                                          11482    0.007    0.028  /tigress/HSC/products-20130212/eups/1.4.0/python/eups/VersionCompare.py:16(stdCompare)
                                                                                          26034    0.018    0.074  /tigress/HSC/products-20130212/eups/1.4.0/python/eups/VersionCompare.py:115(_splitVersion)
                                                                                             41    0.000    0.000  /tigress/HSC/products-20130212/eups/1.4.0/python/eups/tags.py:129(registerTag)
                                                                                         995860    0.599    2.226  /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
                                                                                              1    0.000    0.000  /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:214(determineFlavor)

Total time is 8.2 sec.

TallJimbo commented 9 years ago

Nice overall speedup! And it looks like those Eups regex calls are indeed in the Butler, and in particular in the checks for whether the versions of the pipeline are the same as what's being run. So they should go down quite a bit as a fraction of the total when we run more than one CCD at a time.

HironaoMiyatake commented 9 years ago

Thanks, I am happy to hear that it's promising!

Melanie, I found the following error. I guess we did not add _chip and _sky things on the visit level. Could you fix this?

2014-09-19T19:45:38: CameraMapper WARNING: No registry loaded; proceeding without one
Traceback (most recent call last):
  File "/tigress/HSC/users/miyatake/Stile/bin/StileVisitNoTract.py", line 5, in <module>
    VisitNoTractSingleEpochStileTask.parseAndRun()
  File "/tigress/HSC/products-20130212/Linux64/pipe_base/HSC-3.0.0/python/lsst/pipe/base/cmdLineTask.py", line 353, in parseAndRun
    resultList = taskRunner.run(parsedCmd)
  File "/tigress/HSC/products-20130212/Linux64/pipe_base/HSC-3.0.0/python/lsst/pipe/base/cmdLineTask.py", line 157, in run
    resultList = mapFunc(self, self.getTargetList(parsedCmd))
  File "/tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py", line 649, in __call__
    result = task.run(*args)
  File "/tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py", line 786, in run
    results = sys_test(self.config, *new_catalogs)
  File "/tigress/HSC/users/miyatake/Stile/stile/hsc/sys_test_adapters.py", line 310, in __call__
    return self.sys_test(*new_data, per_ccd_stat = per_ccd_stat)
  File "/tigress/HSC/users/miyatake/Stile/stile/sys_tests.py", line 1015, in __call__
    psf_g1, g1, g1_err = self.getStatisticsPerCCD(array['CCD'], array['psf_g1'],
ValueError: field named psf_g1 not found
msimet commented 9 years ago

Hmm, that was clever of me. I'll go fix this...sorry for the delay!

msimet commented 9 years ago

Just pushed a thing to branch #39...was that enough to make it work?

HironaoMiyatake commented 9 years ago

It's getting closer, but here is another bug... Could you make this change on master as well so that I can compare the computing time between these two branches?

2014-09-19T20:02:01: CameraMapper WARNING: No registry loaded; proceeding without one
Traceback (most recent call last):
  File "/tigress/HSC/users/miyatake/Stile/bin/StileVisitNoTract.py", line 5, in <module>
    VisitNoTractSingleEpochStileTask.parseAndRun()
  File "/tigress/HSC/products-20130212/Linux64/pipe_base/HSC-3.0.0/python/lsst/pipe/base/cmdLineTask.py", line 353, in parseAndRun
    resultList = taskRunner.run(parsedCmd)
  File "/tigress/HSC/products-20130212/Linux64/pipe_base/HSC-3.0.0/python/lsst/pipe/base/cmdLineTask.py", line 157, in run
    resultList = mapFunc(self, self.getTargetList(parsedCmd))
  File "/tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py", line 660, in __call__
    result = task.run(*args)
  File "/tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py", line 781, in run
    cols.append('_'.join(c.split('_')[:-1]))
AttributeError: 'tuple' object has no attribute 'append'
msimet commented 9 years ago

OK, this appears to be done (here and on master)...

msimet commented 9 years ago

BTW, I found the branch where I'd tried a couple of things, and the guess for bullet point 1 was to add .copy(deep=True) to the masked catalog passed to computeExtraColumn. This was the thing that failed at .getCentroid() as I think I mentioned to you in email. But this current line of inquiry looks like it's saving a lot of time, so that may not be necessary!

HironaoMiyatake commented 9 years ago

Thank you for the fix! Okay, for the entire field, it's 221 sec without the optimization and 70.5 sec with the optimization! I attach the result of profiling. Half of the time is occupied by data loading. The Schema.find() things come up again, but even after the fix suggested by Melanie, that will save only a few seconds. Any room for speed up?

>>> pv39.sort_stats("time").print_stats(50)
Fri Sep 19 17:36:35 2014    profile_visit_39.dat

         9613902 function calls (9573794 primitive calls) in 70.513 seconds

   Ordered by: internal time
   List reduced from 6154 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100   38.233    0.382   38.233    0.382 {_tableLib.SourceCatalog_readFits}
      100   17.958    0.180   18.198    0.182 {_persistenceLib.Persistence_unsafeRetrieve}
  1046028    0.819    0.000    1.279    0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:226(_compile)
   995860    0.772    0.000    3.022    0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/utils.py:127(<lambda>)
        1    0.769    0.769   62.910   62.910 /tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py:704(run)
      784    0.755    0.001    0.961    0.001 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/shutil.py:66(copyfile)
  1034307    0.634    0.000    2.359    0.000 /tigress/HSC/products-20130212/Linux64/python/2.7.6/lib/python2.7/re.py:139(search)
        1    0.609    0.609   63.520   63.520 /tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py:658(__call__)
  1036468    0.589    0.000    0.589    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
       25    0.493    0.020    0.493    0.020 {imp.load_module}
      100    0.438    0.004   19.474    0.195 /tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py:364(computeShapes)
      100    0.348    0.003    1.303    0.013 /tigress/HSC/users/miyatake/Stile/stile/hsc/sys_test_adapters.py:53(MaskPSFStar)
     5700    0.305    0.000    0.305    0.000 {_tableLib.Schema__find_Coord}
  1084167    0.300    0.000    0.300    0.000 {method 'get' of 'dict' objects}
     5500    0.276    0.000    0.276    0.000 {_tableLib.Schema__find_I}
     5300    0.272    0.000    0.272    0.000 {_tableLib.Schema__find_Angle}
     6992    0.252    0.000    3.291    0.000 {filter}
     2800    0.251    0.000    2.216    0.001 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:727(find)
      120    0.220    0.002    0.220    0.002 {_policyLib.Policy_mergeDefaults}
   215217    0.199    0.000    1.865    0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:8139(__getitem__)
   534266    0.192    0.000    0.193    0.000 {isinstance}
   208326    0.186    0.000    0.186    0.000 {_tableLib.BaseRecord_get}
     3300    0.166    0.000    0.166    0.000 {_tableLib.Schema__find_L}
   215017    0.164    0.000    0.164    0.000 {_tableLib._SourceCatalogBase___getitem__}
     3200    0.164    0.000    0.164    0.000 {_tableLib.Schema__find_D}
   208326    0.158    0.000    0.426    0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:2370(get)
    20921    0.158    0.000    0.158    0.000 {posix.stat}
     3200    0.155    0.000    0.155    0.000 {_tableLib.Schema__find_MomentsD}
     2900    0.146    0.000    0.146    0.000 {_tableLib.Schema__find_String}
   213517    0.141    0.000    0.561    0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/table/tableLib.py:8221(__iter__)
     3252    0.137    0.000    0.137    0.000 {numpy.core.multiarray.zeros}
     2500    0.127    0.000    0.127    0.000 {_tableLib.Schema__find_CovF}
      741    0.097    0.000    0.097    0.000 {method 'write' of 'file' objects}
       13    0.086    0.007    0.237    0.018 /tigress/HSC/products-20130212/Linux64/matplotlib/1.3.1/lib/python/site-packages/pyparsing-2.0.1-py2.7.egg/pyparsing.py:3145(oneOf)
        7    0.078    0.011    0.078    0.011 {_policyLib.Policy_createPolicy}
     1864    0.073    0.000    0.073    0.000 {method 'reduce' of 'numpy.ufunc' objects}
     1642    0.068    0.000    0.068    0.000 {open}
8259/5434    0.066    0.000    0.261    0.000 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/VersionCompare.py:16(stdCompare)
      112    0.061    0.001    0.324    0.003 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/cameraGeom/utils.py:197(makeCcd)
      301    0.053    0.000    0.053    0.000 {method 'execute' of 'sqlite3.Connection' objects}
        1    0.052    0.052    0.164    0.164 /tigress/HSC/products-20130212/eups/1.4.0/python/eups/__init__.py:1(<module>)
      200    0.051    0.000    0.051    0.000 {_tableLib.SourceCatalog_subset}
   184143    0.051    0.000    0.081    0.000 /tigress/HSC/products-20130212/Linux64/matplotlib/1.3.1/lib/python/site-packages/pyparsing-2.0.1-py2.7.egg/pyparsing.py:3163(<lambda>)
    73456    0.049    0.000    0.089    0.000 /tigress/HSC/products-20130212/Linux64/afw/HSC-3.0.0/python/lsst/afw/geom/geomLib.py:3117(__getitem__)
    35016    0.049    0.000    0.126    0.000 /tigress/HSC/products-20130212/Linux64/pex_policy/HSC-3.0.0/python/lsst/pex/policy/policyLib.py:2330(_Policy_get)
      100    0.047    0.000   20.302    0.203 /tigress/HSC/users/miyatake/Stile/stile/hsc/base_tasks.py:236(generateColumns)
   232603    0.046    0.000    0.046    0.000 {method 'startswith' of 'str' objects}
      800    0.042    0.000    0.042    0.000 {_tableLib.Schema__find_PointD}
      800    0.042    0.000    0.042    0.000 {_tableLib.Schema__find_PointI}
     4219    0.042    0.000    0.042    0.000 {numpy.core.multiarray.array}