obsidian-html / obsidian-html

Python code to convert Obsidian notes to proper markdown and optionally to create an html site too.
https://obsidian-html.github.io
GNU General Public License v3.0
335 stars 50 forks source link

Dataview integration #271

Open hgrw opened 2 years ago

hgrw commented 2 years ago

Are there any plans to integrate dataview compatability? Alternatively, is there a simple way to use obsidian-html 2.3.1 to render dataview code blocks?

dwrolvink commented 2 years ago

There aren't any plans yet to add that. Adding support for plugins is also kind of a can of worms that I want to avoid opening.

I get that the dataview plugin is a pretty unique and powerful one though.

I don't have much time this and the coming month, so development will be slow. But I'll just keep this feature request open and if someone makes a PR for it I'll review/integrate it.

dwrolvink commented 2 years ago

Just out of curiosity, when you say integration, would you know of a way to use the original plugin to render the dataview code blocks for us?

hgrw commented 2 years ago

I understand why adding support for plugins can be a can of worms. Fair point. Nonetheless support for dataview (not to mention templater and others) would be a really nice extension / feature set. And yes, using the original plugins to render the code blocks is what I had in mind. Fetch html and css generated by the plugin, dump it somewhere in /tmp and then sub it in during the markdown-to-html step.

dwrolvink commented 2 years ago

Yeah I've been thinking about that route, but I'm pretty sure that it will be very hard for an external program to just read out that information.

I'm looking into writing an Obsidian plugin that fetches the html from inside Obsidian and then dumps it to a folder somewhere, and then do what you described.

hgrw commented 2 years ago

Sounds like a good idea. Unfortunately I won't be able to contribute any time soon but will keep an eye on this PR just in case.

dwrolvink commented 2 years ago

Small progress report:

I created a plugin that can export the html from the view mode for each note in the vault: https://github.com/obsidian-html/obs.html

For a simple dataview I get this in obsidian, and this in the website (if I manually move over the html code to the precompiled website):

image image

Obviously the mouse-overs are missing and I still need to write some code to grab the html from the output, but that should be peanuts. Pretty happy with the plugin experience and the progress so far.

jcolson commented 2 years ago

Wow, that's pretty incredible and will be REALLY useful!

dwrolvink commented 2 years ago

Alright, I've got a first working version. See this commit for all the code that is relevant for this feature: https://github.com/obsidian-html/obsidian-html/commit/783ea5b90a45e9eabe9a9404bdc27469584d2947

test it out

Instructions are now as follows for those who want to try it out:

how it works

The way it works is that we register a preprocessor to python-markdown that will replace any

  ``` dataview
  <content>
  '``

blocks, with <table class="dataview"><content></table> from the export files. The first dataview block is replaced with the first table, and so on.

what you get atm

timeouts

Known issues atm is that I know of no event that signals that the html is loaded, so there is some fiddling with the timeout between opening the note in the editor and reading out the html.

It's hard to write a test because there are multiple steps, like, first there is no html, then an empty div, then an entire empty table and only then the table is filled in with values, but I'm guessing that there are substeps there too so that's kind of difficult.

Maybe just make the timeout configurable in the config.yml and have the user tweak it to what works for them.

But yeah, not ideal , because I can imagine that a small note with a tiny dataview needs less long to load than a huge note. Current timeout is 50 ms and that is already on the limit for the documentation vault, and that is not even that large of a vault by any measure, so if you'd have to increase it to 100ms you are going to feel the pain.

Perhaps I could do something like timeout_length_ms = a*len(html.split('\n'), but it's all kinda meh. The discord plugin writers have been ghosting me for a day regarding an html-load event, so I'm guessing that's not yet a thing.

tbc

hgrw commented 2 years ago

I had a play around and almost got it to work for my dataview tables. Ran into a couple issues:

  1. get_dataview_tables can't find tables in my entrypoint file, 'Index.md'. Looks like self.extension.getConfig('note_path') only finds entrypoint files 'index.md'. Hard coded my entrypoint path to get around this issue. Maybe this caused 2,3, below.
  2. BeautifulSoup(html, 'lxml') can't find tables. Applied this fix to get around this issue.
  3. I suspect (2) then broke html object so that dataview_tables[counter] (DataviewExtension.py, line 60) throws an index error. My entrypoint page has 3 tables but two were detected in html. Quick fix to try get something rendering was a try-catch and counter decrement in this case.

After this I was able to get something rendering but html was mangled so it didn't show up as a table. I would have persisted but this feature will only be useful if I can run it in a CI pipeline. Looks like the limitation here might be obsidian itself. Tracking obs.html generated files in git is not feasible and since the obsidianhtml execution happens in a gitlab pages pipeline, obs.html files won't be available for pages build.

FWIW, the repo I'm using to test this feature out on is https://gitlab.com/tumbleweed-rover/literature-review, pages at https://tumbleweed-rover.gitlab.io/literature-review/

dwrolvink commented 2 years ago

could you add the obs.html folder to your git repo maybe? it would be interesting to see what goes wrong in the output.

Not sure what you mean with 1, the value of self.extension.getConfig('note_path') is set here: https://github.com/obsidian-html/obsidian-html/blob/556b57d3453b4c0c6422700ba26ef7c0586d2211/obsidianhtml/__init__.py#L297 and should thus just equal the path of the current note that is being processed. Is that not happening?

I did notice that obsidian takes its time to load the html of a page, this is done in steps, and it is really difficult to check when the loading is done. it might be a fix to just wait longer: you can do this by changing this line to wait 100 ms instead of 50: https://github.com/obsidian-html/obs.html/blob/62d5fa688b79489b620a93b95117306c306fafe8/main.ts#L116

Still not really happy about that, but you might try it out.

dwrolvink commented 2 years ago

Huh I can find /literature-review/obs.html/static/master.css in the browser, but not that folder in gitlab :thinking:

Edit: nvm, it is late. You sent only the repo of the vault, not the html output, thought I was looking at the html output for a sec.

hgrw commented 2 years ago

Thanks for the quick response. I will loop back to this in a week or so and set up a test branch and docker container that uses the latest obsidianhtml and tracks obs.html. I will try to avoid tracking any generated files in mainline if at all possible.

hgrw commented 2 years ago

I haven't forgotten about this issue - will provide feedback soon.

hgrw commented 2 years ago

I think there may be a bug in the implementation. My setup is as follows:

config parameters

obs.html plugin settings

Export folder: website/public/obs.html Config.yml path: config/obsidianhtml.yaml

Issue

I can see that obs.html plugin generates file with dataview tables: ./website/public/obs.html/website/submodules/tumbleweed-rover/Tumbleweed Rover Home.md.html

But when running obsidianhtml there is a weird path loop (see /public/md/public/md/public/md in path below):

DataviewExtension.py", line 86, in get_dataview_tables
    with open(path, 'r', encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/mars/git/hgrwilson.com/website/public/obs.html/public/md/public/md/public/md/submodules/tumbleweed-rover/Tumbleweed Rover Home.md.html'
dwrolvink commented 2 years ago

Is there any possibility to export your md and html to any place other than inside of your vault? This is a setup that is not tested and that is known to give annoying errors. I am unable to recreate this, I could try setting my export up to export in the vault itself as well, but that would be somewhere in the weekend.

dwrolvink commented 2 years ago

So one thing first. You set the export folder in the obsidian extension to be: public/obs.html/website, this will write to

{vault_location}/public/obs.html/website/{rel_path}

Folder under dataview says folder: 'public/obs.html' This will look under

{vault_location}/public/obs.html/{rel_path}

So you should correct that at least.

Still doesn't explain the threesome public/md But if you look at :

obsidian_folder_path_str: 'website'
md_folder_path_str: 'website/public/md'

Then it seems that the resultant of the two is public/md and your file example is three levels deep: ./website/public/obs.html/website/submodules/tumbleweed-rover/Tumbleweed Rover Home.md.html

So something might be going on there.

Changing your settings to absolute paths might fix this for now. If you can test this then it will give me valuable information for debugging.

Please also test whether obsidianhtml works at all without the dataview feature enabled. Because if the issue is in the relative path, then it should not just affect the dataview feature, but it should then not work at all.

obsidian_folder_path_str: '/home/mars/git/hgrwilson.com/website'
obsidian_entrypoint_path_str: '/home/mars/git/hgrwilson.com/website/Site-Home.md'
md_folder_path_str: '/home/mars/git/hgrwilson.com/website/public/md'
md_entrypoint_path_str: '/home/mars/git/hgrwilson.com/website/public/md/index.md'
html_output_folder_path_str: '/home/mars/git/hgrwilson.com/website/public'

toggles:
  features:
    dataview:
      enabled: True
      folder: 'public/obs.html/website'  # must always be relative, no preceeding slash
dwrolvink commented 2 years ago

Okay yeah if I write the md and html output to vault_path/md and vault_path/html then I get issues, will make a bugreport for that. Has to do with the new path-based indexing I guess.

To resolve this place the output locations outside of the vault path and remove the old md and html folders.

dwrolvink commented 2 years ago

So tl;dr:

That should resolve your latest bug.

Just for a third time though: exporting in your vault is discouraged for multiple reasons, best not to do it. But the points above should make sure that it fixes that bug that you had earlier if you are really intent on exporting into your vault.

hgrw commented 2 years ago

Interesting. I didn't realise exporting to vault was discouraged. It would be good to mention that in quickstart docs. A bunch of people at work now use obsidianhtml for their docs and everyone exports inside vault dir. This is because absolute paths are brittle and relative paths with ../ are hacky. Obsidianhtml works fine with dataview feature disabled.

So one thing first. You set the export folder in the obsidian extension to be: public/obs.html/website, this will write to

{vault_location}/public/obs.html/website/{rel_path}

Folder under dataview says folder: 'public/obs.html' This will look under

{vault_location}/public/obs.html/{rel_path}

So you should correct that at least.

I'm not sure this is right. Obsidianhtml export should have been {vault_location}/website/public, with dataview:folder {vault_location}/website/public/obs.html. The settings I had produced that pattern. I suppose it is best not to use the same obs.html directory that create_index_from_dir_structure uses. I did that to keep the generated files orderly.

Relative paths didn't end up fixing the problem either. I will loop back to this after next release.

hgrw commented 2 years ago

Also, the comment config file, dataview:folder # relative to the vault being processed. this folder should always be in the vault is not correct, it is actually relative to obsidian_folder_path_str. This caused a bit of confusion when setting up my paths

dwrolvink commented 2 years ago

Hmm yeah for a long time it wouldn't even work when you exported into the vault, because of duplicatefilename errors.

Aside from what is en-/discouraged, I don't think anyone should want a tool to write in the same directory as where their notes are generated, and not only because those md notes will also show up in obsidian's filepicker among other places.

But yeah anyways, you are right about it being a logical thing to do, and people will probably do it regardless. I guess I should just add it as a test case.

The folder in the extension and the folder in dataview:folder are both directly relative to the vault root and should thus be the same value.

ps: you are right that it is relative to the obsidian_folder_path_str value. That value should be the vault root though. (It is described as "the first folder that contains all the notes", which in hindsight is a plain wrong definition of the vault root.) I guess I have not enough creativity to expect all the ways that users will play with the constraints to make it fit their usecases. I guess I should redo the documentation with that in mind, and be more concrete and accurate :smile: still learning

hgrw commented 2 years ago

In general the docs are really good. I imagine most users are exporting to their vault dir though so that would be something to test, yeah. Consider that most docs will be generated inside a docker container in a pipeline so there is no risk to vault integrity. Besides, vault will be in VCS for most users.

I suspect that this feature won't work with the folder structure in the vault I'm using to test it out:

Steps

  1. html_output_path_str is {vault_dir}/website/public (using absolute path)
  2. obs.html plugin export folder to {vault_dir}/website/public/dataview-html (using relative path) ...so far so good. dataview-html is in a clean environment
  3. check dir structure: cd website/public && tree -L 4
    .
    └── dataview-html
    ├── README.md.html
    └── website
        ├── random-posts
        │   └── Converting a Car to Remote Control in 72h.md.html
        ├── Site-Home.md.html
        ├── software-engineering
        │   ├── Automated Integration Testing with Hardware In the Loop.md.html
        │   └── CI-CD for Embedded Systems, A Git Workflow.md.html
        ├── submodules
        │   ├── hil-target-stm32
        │   ├── hil-tester
        │   ├── mico
        │   └── tumbleweed-rover
        └── systems-engineering
            ├── 0. Safety-Critical Systems Analysis.md.html
            ├── 1. Gross Dependability Analysis with Markov Chains.md.html
            ├── 2. Failure Analysis with Binary Fault Trees.md.html
            ├── 3. Redundancy as a Tool for Dependable System Design.md.html
            └── 4. Failure Analysis with Bayesian Fault Trees.md.html
    
  4. Check that "Tumbleweed Rover Home.md.html" exists: cd {vault_home} && find . -name *Home.md.html
    ./website/public/dataview-html/website/submodules/hil-tester/documentation/HIL Tester Home.md.html
    ./website/public/dataview-html/website/submodules/tumbleweed-rover/Tumbleweed Rover Home.md.html
    ./website/public/dataview-html/website/submodules/hil-target-stm32/documentation/HIL Target Home.md.html
    ./website/public/dataview-html/website/Site-Home.md.html
  5. Set dataview:folder to public/dataview-html and run obsidianhtml:
    • No such file or directory: '/home/mars/git/hgrwilson.com/website/public/dataview-html/submodules/tumbleweed-rover/Tumbleweed Rover Home.md.html'
    • See that all Home.md.html files are deleted (no hits with `find . -name Home.md.html`). This is kind of weird.

I put the tree above to illustrate that the dataview feature is looking in website/public/dataview-html/public/md/submodules/tumbleweed-rover/, but it should be looking in website/public/dataview-html/website/submodules/tumbleweed-rover/. So public/md should actually be website

dwrolvink commented 2 years ago

Your problem is that you put stuff in your html output dir, which will be emptied when you run obsidianhtml:

https://obsidian-html.github.io/Configurations/Configuration%20Options.html#!no_clean

dwrolvink commented 2 years ago

To be more concrete: set that value to true if you want to clean the output folders yourself. I guess in a pipeline you don't need to clean anyways.

hgrw commented 2 years ago

Fair. That explains the missing files. When I turn clean off the search path for the missing file goes from

'/home/mars/git/hgrwilson.com/website/public/dataview-html/public/md/submodules/tumbleweed-rover/Tumbleweed Rover Home.md.html'

to

'/home/mars/git/hgrwilson.com/website/public/dataview-html/public/md/public/md/public/md/submodules/tumbleweed-rover/Tumbleweed Rover Home.md.html'

The public/md loop is back. I am using absolute paths for *_path_str settings in the config.yaml.

I will revisit this after next release and report back.

dwrolvink commented 2 years ago

Sigh, I guess there is another edgecase where that loop pops up, back to the drawing board. You can fix it now (probably) by updating the exclude_subfolders value, I guess you needed "website" instead of "public", not sure what you've configured, but I don't think you can get this bug if you use the former value.

But yeah, bed time for me too. Shame this one step has to take so much effort, but it is what it is.

hgrw commented 2 years ago

It's nearly 7am where I am. Getting some stuff done before work ;D. I am not using the new master that you just pushed. I will use your fix when next release rolls around and report back.

dwrolvink commented 2 years ago

Ahh good to know that you are not using the latest master code. Though why not if I may ask? The whole purpose of installing master code is testing it well before it is added to a release :thinking:

Anyways, I'll just keep this issue open so report back when you have time/the new release is out, it's fine

hgrw commented 2 years ago

Just comes down to time management. Using a more stable tool for day-to-day means less time spent working through issues

dwrolvink commented 2 years ago

version 3.2.0 has been published

dwrolvink commented 1 year ago

not enough feedback, if someone else wants this feature and has time to provide faster feedback, comment here and we will pick it up again

GollyTicker commented 1 year ago

I am still interested in this feature, however I've limited time right now to participate. However, could you simply keep the issue open at least?

dwrolvink commented 1 year ago

Fine by me, I'll write a full comprehensive instruction soon on how to set it up. With the hope that someone will test the setup.

For me, the very basic usecase works now: https://obsidian-html.github.io/Demonstrations/Implementing%20Dataview.html But I don't really use it. This proof of concept should mean it now works for any kind of dataview block, because we just take its output.

GollyTicker commented 1 year ago

Ah, I didn't realize, that you had already implemented it. I misunderstood your comment. In that sense, I think this is fine.

If I come across any problems, then I'd look at the code and make an issue/pr.

If you want, you can close this now then.

Thanks a lot for implementing it! ❤️

dwrolvink commented 1 year ago

Np. It is still in alpha state, so it could use some work making it easier to configure and for stability, but I'll keep it open for now and just wrap it up.

Good enough to know that there are people who will use this.

GollyTicker commented 1 year ago

I am not able to test this in the next few days, but does the current state support DataviewJS queries?

I've looked a bit into the code and I'm not sure I understand the context to know whether it works or not.

dwrolvink commented 1 year ago

The way it works at the moment is that you must install an Obsidian plugin. That plugin will have an export button, which opens every note in your vault in succession, and then exports the generated html of any dataview tables it finds to an output file (in a folder in your vault, see the plugin settings).

When that is done, configure obsidianhtml to enable the dataview feature. It will then cut out any dataview blocks and replace them with string like dataview_1, dataview_2, etc (more unique then that, but you get the point). Later in the process, when the html is generated, it will replace a string like dataview_1 with the first exported table in the corresponding output file.

This way of working means that whatever your dataview plugin can render, we can "steal" from Obsidian.

This is nice, but it is also kind of fragile.

As I'm writing this, I got some action points for me:

If that works a bit better I can write a full instruction on how to setup everything, and maybe make that process easier. (For example I learned that I can just read the plugin settings from outside the vault, so configuring the export dir in obsidianhtml will not be necessary once I use that).

So that will be involved in "wrapping this issue up".

jastice commented 1 year ago

Thanks for this! I'll likely be checking it out at some point in a few weeks, too :)

GollyTicker commented 1 year ago

@dwrolvink

Thanks for your explanations! That means that DataviewJS should work.

Regarding finding out when the rendering is over. I could imagine that perhaps an issue could be created in the Dataview plugin which would add some html attributes indicators for if the processing is finished.

dwrolvink commented 1 year ago

That's an interesting approach, I'll check that out!

Edit: Alright so there are 283 open issues, and I have a feeling our request won't have a very high prioritisation... :thinking:

GollyTicker commented 1 year ago

I am not sure. The implementation is probably simple - and because Dataview is so widely used it's become kind of standard in Obsidian. It could become even more useable within other plugins with this small feature. Maybe I'll look into that some day

HyperEpsilon commented 1 year ago

I'm super glad there's some dataview functionality, I have some kind of complicated queries that I plan to test out.

However, do you know if in-line queries are processed by this as well? I ask because the way I have some of my notes set up is to define a title in the frontmatter, then use a DV inline query to put it as the header

Example:

---
my-title: My Title
---
# `= this.my-title`

The above code renders the contents of the my-title field as a H1 header

dwrolvink commented 1 year ago

@HyperEpsilon just added some code to master which now allows me to use inline dataview queries. I've only tested your exact example, there might still be some bugs. It now hits on `=\` and `$=\` (js).

dwrolvink commented 1 year ago

Note that there is a new version of the companion plugin, but this only fixes the fact that it made text white (some test code that I forgot to remove in 0.0.1). Should be enough just to install obsidianhtml from master.