Please consider making reflection (Type and ByteCodeType) "smarter" about caching

GoogleCodeExporter commented 9 years ago

I love org.as3commons.reflect.*.  But, I do not love the way it manages type 
information.

It seems that type information is parsed and cached "en masse" -- that is, all 
type information for the entire .swf is loaded and then sits like a lump in 
memory.  Apparently, the only methods for managing the cached information are 
ByteCodeType.fromXxx() and ByteCodeTypeCache.clear().  (At least, it seems this 
how ByteCodeTypeCache works.)

I have very large project which uses ByteCodeType in many different places and 
at different times in the application life-cycle.  I call 
ByteCodeType.fromLoader() at startup to fill the cache with type information.  
Afterwards, ByteCodeTypeCache.clear() is not fine-grained enough to manage the 
cache -- performance would suffer if the type information was loaded and 
cleared en masse again and again as my application ran.  On the other hand, my 
application doesn't make use of *all* the type information for the entire .swf, 
so it is a great waste of memory for all the ByteCodeType-related objects (and 
strings!) for every possible type in my application to sit in memory.  It is 
also a great waste of time...  it takes several seconds for my application to 
start (while ByteCodeType.fromLoader() runs, loading the universe).

So, this issue to ask you to please consider the following:

1) Parsing type information more "on-demand", perhaps loading each type only 
when it is asked for (for example, when Type.forInstance() or 
ByteCodeType.forClass() is called).  Or, take it to the next level, and even 
parse individual member info on-demand (for example, when an instance of Method 
or Variable is needed).

2) Caching all type information using SoftReferences, so unused type 
information will naturally "fall out" of memory when no longer needed (i.e., 
let the application's actual set of references control what stays cached).

Ideally, an application should be able to freely use instances of Type and 
ByteCodeType and only pay the costs to parse, load, and store the type 
information that is actually used.

Perhaps calls such as ByteCodeType.fromLoader() should simply do some 
lightweight "cursory" parse (maybe just build a list of the type names and 
offsets to where the type information is), and leave the more detailed parsing 
until later (on-demand).

It would also be extremely desirable for any time-consuming functionality to 
operate asynchronously, providing an event or callback notification when the 
result is available.  (By "asynchronous", I meant dividing up the work and 
processing it in "slices", never taking longer than some caller-specified 
duration per slice.)

Original issue reported on code.google.com by idontneedthisacct@gmail.com on 27 May 2011 at 10:03

GoogleCodeExporter commented 9 years ago

Hi there,

thank you for taking the time comment! And my sincere apologies for responding 
so late!
I'm afraid the fine-grained parsing of the type info is just not possible due 
to the way a SWF is serialized.
It will always be necessary to parse the entire constant pool at the start of 
the SWF, otherwise there will be no context for the indexes that are read in 
the rest of the file, so that is already a step we can't skip.
After that, it isn't so straightforward to just parse out one class. The way 
that this info is layed out internally isn't the way that you would expect it 
initially. A class isn't serialzied into one continuous block of bytecode I'm 
afraid. First ALL the mehods of ALL classes are serialized, then all of the 
metadata, after that the instance infos/traits, class infos/traits, script 
infos/traits and finally all of the method bodies and exception infos.
All fo these blocks do not have any boundary information stored. I.e. at the 
beginning of a block of class info it doesn't tell me how LONG this block is. 
Therefore I can't conveniently skip over parts. Event if I know the number of 
U30 entries in the next block of bytearray, I will still need to parse them all 
since U30 is of variable length in the bytearray.
So, in order to parse out all of the offsets it would require me to parse the 
entire file, hence negating the benefit of storing these offsets.
The largest part of the, memory wise, of the type info is of course the 
constantpool, but I'm afraid I need to parse this one completely as well for 
the same reasons.
The SWF format was designed to store information as efficiently as possible, 
which makes it a lot harder to progressively parse...

Storing the entries as SoftReferences will also yield strange results, what if 
you request a piece of typeinfo, then request it again sometime later in the 
application lifecycle and its gone all of a sudden?

One of the developers on the as3commons team is experimenting with adding a bit 
of Alchemy magic to the parsing process, this way the parsing step will become 
as efficient as possible.
Also, do note that the parsing steps take quite a bit longer in the debug 
player and while parsing a SWF that is compiled in debug mode. Try testing it 
in a regular player with a release build of your code and you will probably 
notice a significant increase in parsing speeds.

Your last comment does make sense actually, it will not be easy to implement 
but if I find the time somewhere in the near future I will try to look into 
implementing some kind of green threading in order to be able to prevent the 
application from freezing while parsing the bytecode.

cheers!

Roland

Original comment by ihatelivelyids on 29 Jun 2011 at 6:49

GoogleCodeExporter commented 9 years ago

Roland, thanks for the detailed reply.

I understand the kinds of issues you describe.  That is shame, however, because 
it can make using org.as3commons.reflect.bytecode.* untenable in larger 
projects (because it takes so much time/memory to parse/store the typeinfo).  
Parsing asynchronously would at least help with keeping the app responsive, but 
use of the typeinfo would still be delayed (which could still be an issue).

One of my biggest problems, for example, is that I use 
ByteCodeType.fromLoader() and ByteCodeType.getClassesWithMetadata() at startup 
to obtain a list of special classes.  However, this makes my large project take 
FOREVER to start.  Okay, it doesn't take forever :), but running under the 
profiler is every more painful...  there, startup takes *minutes* as thousands 
and thousands and thousands of strings/objects are allocated while all the 
typeinfo is built).

Is there a more efficient way to simply obtain a list of tagged classes that I 
could use?  If not, since such a class list is so useful, perhaps you could 
consider creating a special-purpose faster/smaller parse for just this one 
common/useful task?  If there were an efficient way to obtain even just the 
complete class list, I would then perhaps look up the metadata tags using the 
non-bytecode org.as3commons.reflect.* APIs (which can do so much more 
efficiently).

Thanks,
- Matt

Re: memory usage.  Would you consider providing an option to selectively purge 
things from the cache (for example, when the client knows they will never be 
needed).  Perhaps some kind of include/exclude filter specification (e.g., keep 
everything parsed for "com.mystuff.*", but purge other stuff).

Re: weak references.  What I meant of course was that the metadata would simply 
be reparsed when it fell out of memory and was needed again later (and this 
process would be transparent to the library client).  But, as you've explained, 
"progressively" and/or "selectively" parsing the SWF format is difficult.

Original comment by idontneedthisacct@gmail.com on 21 Jul 2011 at 2:03

GoogleCodeExporter commented 9 years ago

Hey,

if you just want the metadata info then I suggest you use 
ByteCodeType.metadataLookupFromLoader(), this will do exactly what you suggest. 
It only deserializes the constantpool and constructs the lookup for 
metadata->classnames. You can use the 'normal' reflection functionality on 
those classnames.

Its described here:
http://www.as3commons.org/as3-commons-bytecode/introduction.html
Under the header 'metadata scan'.

This is indeed quite a bit faster than doing the full bytecode reflection.
Someone on the as3commons team is also experimenting with some Apparat/Alchemy 
optimizations which will speed up the process a bit more. I'm not sure when 
that'll be finished though.
My *real* hope actually is that Adobe will some day implement *proper* 
reflection in the Flash player. From that day on we can ditch the bytecode 
reflection alltogether. But you know, we'll have to wait and pray I guess...

I'm sure there's already JIRA tickets opened for that, I suggest you look them 
up and vote for them. (Every vote counts! :))

I'm also thinking of re-writing a bit of the bytecode reflection to use the 
constantpool for string lookup. Now for every type every string is looked up in 
the constantpool and copied into the instance. I guess passing in an instance 
of the constantpool and having each string looked up at runtime could safe a 
massive amount of memory...

Hmmm, lots of things to consider... So little time...

cheers,

Roland

Original comment by ihatelivelyids on 21 Jul 2011 at 8:31

GoogleCodeExporter commented 9 years ago

Thanks Roland, ByteCodeType.metadataLookupFromLoader() is indeed faster.

But, now I am faced with the problem of the limitations of the "normal" 
typeinfo accessed via plain old Type instances...  they do not expose method 
argument names or default values.  Only ByteCodeType can supply that 
information. :(  That is why I wanted to be able to selectively parse only the 
classes/methods I was interested in...  I want efficient access to typeinfo, 
but also need stuff that only ByteCodeType can provide.

Re: memory usage.  Yes, if there is currently any "wasteful" duplication of 
strings (or other data) when parsing typeinfo, then please by all means 
consider adding some kind of "coalescing" to reduce the memory footprint (it 
would probably also speed things up).  When I call ByteCodeType.fromLoader() in 
my large project, an *amazing* amount of memory is allocated. ;)

If nothing can be done about the need to parse/store *all* typeinfo when using 
ByteCodeType, then more efficient memory usage and asynchronous parsing (as 
mentioned previously) would help make things palatable for large, real-world 
projects.  The ByteCodeType stuff is very, very cool...  but the current 
implementation can be somewhat impractical for some purposes.

Thanks,
- Matt

Original comment by idontneedthisacct@gmail.com on 23 Jul 2011 at 2:27

GoogleCodeExporter commented 9 years ago

Original comment by mastakan...@gmail.com on 11 Sep 2011 at 6:24

Added labels: reflect

GoogleCodeExporter commented 9 years ago

Original comment by martin.h...@gmail.com on 12 Sep 2011 at 6:13

Added labels: SubProject-bytecode
Removed labels: reflect

GoogleCodeExporter commented 9 years ago

Hi there,

I've been busy optimizing the bytecodetype parsing. I think I've managed to 
signficantly speed it up. I'm not finished yet, I still think there's plenty of 
opportunity to tweak it further, but perhaps you can give it a testdrive and 
see if you notice the difference.
The changes are currently only available in the trunk.

Cheers,

Roland

Original comment by ihatelivelyids on 13 Sep 2011 at 11:14

GoogleCodeExporter commented 9 years ago

This is great, thanks for doing some work on this!  Sorry, I haven't had time 
to try it out yet.  Is the new code available in .swc form yet?

Original comment by idontneedthisacct@gmail.com on 26 Sep 2011 at 6:17

GoogleCodeExporter commented 9 years ago

its part of the 1.0 release, you yes, its available for download here:

http://www.as3commons.org/as3-commons-bytecode/index.html

cheers,

Roland

Original comment by rol...@stackandheap.com on 26 Sep 2011 at 6:46

pupsnow / as3-commons

Please consider making reflection (Type and ByteCodeType) "smarter" about caching #54