Closed gtalarico closed 8 years ago
Is there any way that the __init__.py
script could be compiled to __init__.pyc
binary at the first load and run from that the next time to improve execution time?
Caching/Hashing in a good idea but I need to run numbers to see if it's really more efficient than the current system. I've already modified the __init__
to open each script only once so that's already taken care of.
But I agree with you. The loading should be like other addins...Let's keep this discussion open.
Another option is to completely rewrite the __init__
in C# as I mentioned before. What are your thoughts on that?
I take that back. Caching is better and we can get it done quicker.
Okay. I'm working on JSON serializing the tab, panel, group, and script objects I'm creating during the init
process so the next runs can quickly unwrap the objects from the json file and create the UI.
I need to figure out a way to determine if anything has changed. I'm thinking of finding tabs and creating a tuple (folder_size, script_count, icon_count, subfolder_count) signature...and if the same as recorded signature for the tab, it'll just load from json.
Individual tabs will get their own json file and are cached independently.
What do you think?
@eirannejad can you not pickle the entire PyRevitUISession object? that was my initial thought.
as for comparing changes: I think counting size, items, etc would work, but my gut instict is that zipping+hash would be the fastest way
Agreed. I'll start with serializing the whole session and using the md5 function for directory hash.
Just tested this. It's really fast (have to ignore .git because that directory has hundreds of files, and that adds about 2 seconds. ) Problem is, hash is coming out different every time, not sure why... :confused:
Done in: 0.032999515533447266 Hash: fd364c9d1465cc66b37a70177f89e0d3
# http://stackoverflow.com/questions/1855095/how-to-create-a-zip-archive-of-a-directory
import os
import zipfile
import hashlib
import time
def zipdir(path, ziph):
for root, dirs, files in os.walk(path):
if '.git' not in root:
for file in files:
ziph.write(os.path.join(root, file))
t0 = time.time()
zipf = zipfile.ZipFile('test.zip', 'w', zipfile.ZIP_DEFLATED)
zipdir('.', zipf)
zipf.close()
file_hash = hashlib.md5(open('test.zip', 'rb').read()).hexdigest()
t1 = time.time()
total = t1-t0
print('Done in:' , total)
print('Hash:' , file_hash)
How's the speed on the zipping?
my md5 hash comes out the same every time...:/
nevermind, working now. I was checking the whole folder and it was changing because the zip was being added.
zipping + hashing takes 0.032999515533447266 sec
# http://stackoverflow.com/questions/1855095/how-to-create-a-zip-archive-of-a-directory
import os
import zipfile
import hashlib
import time
def zipdir(path, ziph):
for root, dirs, files in os.walk(path):
if '.git' not in root:
for file in files:
ziph.write(os.path.join(root, file))
t0 = time.time()
zipf = zipfile.ZipFile('pyrevitplustab.zip', 'w', zipfile.ZIP_DEFLATED)
zipdir('pyrevitplus.tab', zipf)
zipf.close()
file_hash = hashlib.md5(open('pyrevitplustab.zip', 'rb').read()).hexdigest()
t1 = time.time()
total = t1-t0
print('Done in:' , total)
print('Hash:' , file_hash)
Okay!...I'll use this. Thanks
awesome. on pickling, I have ran into some issues pickling complex objects. hopefully it will work. also, I was just reading the pickle docs, looks like cpickle is a lot faster, and seems to work on ironpython
I'm thinking of using JSON since it's human readable and easier to debug. I kinda like the idea that the serialized session is a uniform format that is readable by other processes as well.
Let's see how fast it is..
sounds good. hope it works.
Switching gears to another optimization opportunity: report()
I just notice that even with verbose off, the "|" character that indicates activity, it's adding a lot of overhead. Removing that single print line alone decreased load time by ~60%. from 3.3 sec to 2.1. I think that's a huge improvement for such a small change If we can get load time to < 2 sec, who needs a progress bar :)
Is there a reason you are using a custom logic (report, reportv) instead of the built in logger module? it would remove all the conditional statements (ie: if verbose ) every time a report is called, which is 100+ times. (would just use logger.info, logger.debug, and then logger.setLovel(INFO/DEBUG) Would be a pain to refactor, and if caching works, probably not worth it anyway!
Honestly I have a problem. If I disable the verbose reporting all together my Revit closes at startup with no message. I’ve been looking into the code but can’t figure out how does, NOT printing a message causes Revit to crash.
Because I’m not as good as you dude! :) I have learned python by myself by trial and errror and never had the time to dive into the more advanced modules as much. I’m open to all suggestions though! and appreciate any help I can get :) Thanks for telling me about the logging module btw. I’ll look into it.
On Sep 10, 2016, at 17:01, Gui Talarico notifications@github.com wrote:
sounds good. hope it works.
Switching gears to another optimization opportunity: report()
1 https://github.com/eirannejad/pyRevit/issues/1
I just notice that even with verbose off, the "|" character that indicates activity, it's adding a lot of overhead. Removing that single print line alone decreased load time by ~60%. from 3.3 sec to 2.1. I think that's a huge improvement for such a small change If we can get load time to < 2 sec, who needs a progress bar :)
2 https://github.com/eirannejad/pyRevit/issues/2
Is there a reason you are using a custom logic (report, reportv) instead of the built in logger module? it would remove all the conditional statements (ie: if verbose ) every time a report is called, which is 100+ times. (would just use logger.info, logger.debug, and then logger.setLovel(INFO/DEBUG) Would be a pain to refactor, and if caching works, probably not worth it anyway!
https://cloud.githubusercontent.com/assets/9513968/18414215/e32bb7a0-778e-11e6-91b2-db7fb31a1c6c.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eirannejad/pyRevit/issues/123#issuecomment-246153152, or mute the thread https://github.com/notifications/unsubscribe-auth/AH0XHGtSwQw4QGFK0AgvncTjPjfzhPEtks5qo0TngaJpZM4J4Xl6.
hey @eirannejad I have been toying with the loader, caching, etc. I played with a few methods for checking for dir change. Although zip+hash is fairly efficient, I think I found a faster method. It just sums the mod time in seconds of all relevant files and folders.
def get_hash_from_dir(script_dir):
"Creates a unique hash # to represent state of directory."
logger.info('Generating Hash of directory')
pat = r'(\.panel)|(\.tab)|(\.png)|(\.py)'
hash_sum = 0
for root, dirs, files in os.walk(script_dir):
if re.search(pat, root, flags=re.IGNORECASE):
hash_sum += os.path.getmtime(root)
for filename in files:
modtime = path.getmtime(path.join(root, filename))
hash_sum += modtime
return hash_sum
Okay. A working cache system is implemented in loaderCaching
branch. It saves and reads JSON
caches for each tab it can find. Cut down load time to 7 seconds in verbose reporting and 3.5s in non-verbose. Thanks for the hash algorithms. For consistency I'm making a md5 from the hash_sum
to create a more standard hash.
Switch branch in your __init__
folder to loaderCaching
and test it out.
Background
Load time, even with verbose off is long consider some people like me restart revit several times a day. Those seconds add-up. A few people I have shared pyrevit with have mentioned this. Ideally, loading the scripts should be as seamless as other addins.
Proposal:
Besides continuing to update loading methods (reading files only once, and improving code perfomancem, etc) I would like to propose the idea of creating a cache. In most reboots, contents of pyrevit folder remain unchanged, so results of the processing the folders could be cached. If they are unchanged, cached is loaded into memory and ribbon is rebuilt without needing to re scan directories, find doc strings, process icon file names, etc.
Implementation
After processing paths, files, icons, doc, version,etc all the objects are pickled/serialized and dumped into a cache folder and stored with with some sort of hash to identify the specific contents. Next time revit starts, it first checks if hash derived from new files match the pickled file. if it has not changed, it, just reload those objects into memory without having to reprocess file tree.
Hash
could be generated from folder sizes, mod date, or perhaps just ziping the entire folder, and computing a md5 hash from the zip file (probably the most effective and efficient if folder file size is small).
Thoughts?