jackyzha0 / quartz

🌱 a fast, batteries-included static-site generator that transforms Markdown content into fully functional websites
https://quartz.jzhao.xyz
MIT License
6.87k stars 2.47k forks source link

Use link reference tracing to remove unreferenced files #556

Open bluerosegarden opened 11 months ago

bluerosegarden commented 11 months ago

Describe the bug I have a folder in my obsidian vault for EPUB and PDF books and documents that I will reference or embed in my private notes. Because of that (and many other private notes) I've added ExplicitPublish plugin to my filters array

filters: [Plugin.RemoveDrafts(), Plugin.ExplicitPublish()],

Once checking my generated public folder, I noticed that there are many more folders emitted than there should've been. When investigating, I found that while all the markdown files aren't there, all of my non-markdown files were emitted still (including the entirety of my books folder).

To Reproduce Steps to reproduce the behavior:

  1. Clone Quartz, run npx quartz create and create an empty ./content folder.
  2. Add any amount of with images, PDFs, or any other files that don't end in .md to ./content
  3. Create a markdown file with the publish property set to true in ./content.
  4. Create a markdown file with the publish property set to false (or not set at all) in ./content
  5. Add Plugin.ExplicitPublish() to the filters array in ./quartz.config.ts
  6. Run npx quartz build

The ./public folder should now contain all of the files in the content folder (with the exception of the non-published note)

Expected behavior I expected that when using the ExplicitPublish plugin, all non-markdown files that are not referenced by a public note are properly filtered out and not emitted into the public folder.

Desktop (please complete the following information):

Additional context If it would be too much work to filter out non-markdown files are referenced by public notes, maybe include a setting (or an additional filter plugin?) that removes any files outside of an explicitly named attachments folder. Obsidian already has a setting to set a default attachment folder, so this shouldn't require too much work on the user's part to switch to that system.

While I could just add the books folder in question to the ignorePatterns property in quartz.config.ts, this doesn't help in situations where the user mixes unpublished and published material together in the same folder, such as having research material in the same folder as their notes. Also, for someone like me who is in the middle of reorganizing my (admittedly very messy) vault, going through and gathering all non-markdown files into one folder to be blacklisted is a lot of work.

This would indirectly solve a related issue where if a Obsidian Plugin creates files for its own use in folders on its own (such as MAKE.md and its context.mdb files), all of those otherwise-empty folders are emitted into the public folder. I actually originally started to file a bug for that specifically before I discovered the true issue.

Finally, it would be nice if the documentation about private pages was updated to explicitly mention this. While there is a warning there, it's only talking about the private pages being accessible in ./content if the entire repo was uploaded as opposed to just the ./public folder.

If anyone needs any help, corrections, or debugging let me know! Many thanks for this piece of software, it's exactly what I've been looking for!

bluerosegarden commented 11 months ago

As a small update, I've tried to make a workaround/solution following the instructions on this stack exchange to create a glob pattern to filter out every non markdown file out of the vault except the attachments folder. This led me to the pattern{!(*.md),!(attachments)/**/!(*.md)}, however this will filter out ALL files in the root folder, including .md files regardless of their published status. I checked this on digital ocean's glob tool and it looks like it should've worked. Maybe I'm misunderstanding something? I hope this helps.

edit: Using just !(attachments)/**/!(*.md) Almost works, but doesn't filter out any non-markdown content on the root level. This is enough of a workaround for the moment but the main issue still stands

jackyzha0 commented 11 months ago

Currently, there's no reference tracing in Quartz to support this so this is 'working as intended' and not a bug but I agree that it would be a good enhancement to have

bluerosegarden commented 11 months ago

Oh alright!! In that case I still believe the documentation should mention it on the private pages page. Should I make a PR?

jackyzha0 commented 11 months ago

Yes! Feel free to make a docs fix for it :) Oversight on my part haha thank you

jackyzha0 commented 11 months ago

(btw, Quartz uses fast-glob for globbing and it's slightly different from the unix glob: https://github.com/mrmlnc/fast-glob?tab=readme-ov-file#pattern-syntax)

bluerosegarden commented 11 months ago

(btw, Quartz uses fast-glob for globbing and it's slightly different from the unix glob: https://github.com/mrmlnc/fast-glob?tab=readme-ov-file#pattern-syntax)

OK THAT EXPLAINS SO MUCH... I'll also update the part of the docs mentioning "any valid glob works" to say "any valid fast-glob works", since despite it linking there I just went straight to the pattern tester hahaha

thank you 🙏

edit: FINALLY GOT SOMETHING WORKING. The following two glob patterns will filter out any non-markdown items except for those in attachments. The reason it has to be split up into two patterns is because I couldn't figure out how to match both root-level files and sub-directories while including an exception for the attachments folder.

"!(attachments)**/!(*.md)", "!(*.md)"

and if you have more than one attachments folder, use this instead

"**/!(attachments)**/!(*.md)", "!(*.md)"

Alright now it's time to update the docs 👍