mfarragher / obsidiantools

Obsidian tools - a Python package for analysing an Obsidian.md vault
Other
402 stars 28 forks source link

performance - file opens & reads #16

Closed stepsal closed 1 year ago

stepsal commented 2 years ago

Hi.

Every markdown file is being opened & read a total of 8 times in normal connect & gather flow. Might make sense to model a note as a class and have it load its own data once.

mfarragher commented 1 year ago

I've made trying out some speed optimisation in dev_speed branch.

So far I'm seeing a ~1/3 reduction in the main Vault setup, as of https://github.com/mfarragher/obsidiantools/commit/6629910e5fdf8e8ddbc95ec95be16632525399d3 commit, just by reducing the number of file reads via connect() method: 22s -> 14s for my biggest vault.

v0.8: speed_0-8

Latest commit: speed_latest

mfarragher commented 1 year ago

I've merged these changes into dev branch.

I don't think there will be much improvement from refactoring gather() - perhaps one more set of file reads could be removed there, but a few sets of file reads have been removed from connect().

mfarragher commented 1 year ago

Wall time has fallen to 13s for my largest vault via this commit (more efficient use of file reads): https://github.com/mfarragher/obsidiantools/commit/ada520beec06cb28bc0613fc289e898c263c03eb

gather-improve-speed

mfarragher commented 1 year ago

I did some basic profiling of the code snippet that sets up my largest vault.

Functions I noticed that were coming up quite high in use of time relate to:

I'm experimenting with some code to make the HTML processing faster - by using one BeautifulSoup object and seeing if there's a neat way to chain methods. The wall time's fallen further to 11.5s but I need to do some manual sense-checking.

Regex is used a lot for the core logic in this package. I think that will set a floor on the vault setup time.

mfarragher commented 1 year ago

I've tweaked the HTML processing: https://github.com/mfarragher/obsidiantools/commit/df329b4f0aee9a4af1a8d0c87eb480e5fef51a71

gather-improve-speed_bs4-funcs

Wall time is now 11s for the setup of my biggest vault. A month ago this was 22s, so it's 2x faster for me now compared to then. :racehorse:

At least for my large vault, when it's set up only with a gather call it has a wall time of 4.4s, so method is the faster one for me.

mfarragher commented 1 year ago

Going to close this as each of the API methods doesn't do more than 2 file opens per note now, by my counting, and I can't see a way to reduce that further if a user wants 'everything' set up in one go. Both methods individually are about twice as fast vs v0.8.1 for my largest vault.

With how the attributes are set up now, it's also possible for users to update the Vault attributes themselves, if there are ways they want to re-use an existing Vault object (e.g. update info for specific notes or add info for new notes): https://github.com/mfarragher/obsidiantools/issues/23#issuecomment-1352068996