stereobooster / braindb

markdown-graph-content-layer-database
https://braindb.stereobooster.com/
20 stars 0 forks source link

Shared vision? #1

Open rufuspollock opened 4 months ago

rufuspollock commented 4 months ago

Hi, cool project 👍. Got pointed here and looked at the vision and thought there may be some connections and synergies with a project we've been working on:

https://markdowndb.com

https://github.com/datopian/markdowndb

Precisely designed as a "content layer"

A rich API to your markdown files in seconds. An open JS library to turn markdown files into structured, queryable data (SQL and JSON). Build rich markdown-powered sites fast and reliably.

Rich metadata extracted including frontmatter, links and more.

Lightweight and fast indexing 1000s of files in seconds.

Open source and extensible via plugin system.

stereobooster commented 4 months ago

Hey 👋 . Those are definitely two very similar projects. First name for this project was mdb (short from markdowndb). I renamed it because I found that braindb was free on npm.

From technical point of view those are very similar as well. We both scan directory, parse markdown and store it in sqlite. The difference are:

But if we disregard technical differences, I think conceptually we're on the same page

rufuspollock commented 3 months ago

👍

in my case I choose to watch directory and constantly re-parse files as soon as they change instead of doing one time scan (if I understood your tool correctly)

We support that yes - see https://github.com/datopian/markdowndb/issues/45

also I decide to not expose internal DB and create abstract layer on top, so that I would be able to change internals without affecting end-users. For example, I can change how I store fields - in separate columns or in one JSON field. Or switch from relational DB (sqlite) to graph db (kuzu), etc.

That's a good point. We have two separate parts of the code: a part that generates an internal (typescript) structure and then exporters that write that e.g. one to simple json, one to SQL(ite).

stereobooster commented 3 months ago

I found one more similar project https://github.com/MicroWebStacks/content-structure cc @wassfila

wassfila commented 3 months ago

very cool, I like this project, I'll be having a closer look. Content-structure is inspired from astro's content collection but I needed to expand to very generic content hierarchy, basicaly no constraints on orgamization and any markdown website should work. Structure unlike content collection, also parses the internal markdown structure so extracts images (even text in svg), tables, code blocks,also links... and create references to them with their sections. Use cases are mainly custom cms renderer I'm my own customer for my e.g. home automation website, https://github.com/HomeSmartMesh/website but also data injection in search engines and even embeddings generation for RAG see this use case https://github.com/VectorWisdom/search-llm-server I'm not focusing on types for example and also not much on the db fetch aspect like sql or other, so I have just json files with light wrappers, but that's where probably this project is better.

stereobooster commented 3 months ago

And one more similar project https://github.com/timlrx/contentlayer2 @timlrx. (supported fork of contentlayer)

stereobooster commented 3 months ago

And this one as well https://github.com/peterbe/docsql cc @peterbe. (I'm sorry for spamming in separate messages)

stereobooster commented 2 months ago

I figured out how to implement Obsidian Dataview in Astro (generally in any SSG that uses remark). See https://astro-digital-garden.stereobooster.com/recipes/obsidian-dataview/

wassfila commented 2 weeks ago

I went more into https://github.com/stereobooster/braindb/blob/main/packages/docs/src/content/docs/notes/vision.md and we're definitely having a shared vision. This singularity of this vision is too good to be true, I just need the right balance between reuse and custom stitching, custom dev. The examples and repos I linked above are not very explicit, for info, this is the "core parser" called "Content Structure" https://github.com/MicroWebStacks/content-structure , it is a separate npm package on purpose, free from any framework specific logic. To prove it, I wrote here an sql exporter, a real pure vanilla SQL https://github.com/MicroWebStacks/markdown-rag-services/blob/main/db/sql-lite-utils.py , I won't fall in the trap of trying to offer "a more fancy API" that ruins openness to any use case, anything can plug on top of sql. "Anything" is a matter of speach, I plan a neo4j db injector as well, for the purpose of making an llm take advantage of such "strong relationship" form through db agents, I'm against letting llms infer graphs from unstructured data. Anyway, long story short, I wish I could see braindb in bigger scale documented examples, and do you have "embeddings" and vectordb in mind, for e.g semantic search,... cause the name "braindb" suggests so.

stereobooster commented 2 weeks ago

do you have "embeddings" and vectordb in mind, for e.g semantic search,... cause the name "braindb" suggests so.

even if it would happen. I assume it would be out of scope of core logic. One can write plugin to sync from BrainDB to any other storage (like, neo4j or vector db). BrainDB exposes events (delete, insert, update) - so developer can attach listeners and pipe data

I won't fall in the trap of trying to offer "a more fancy API" that ruins openness to any use case, anything can plug on top of sql.

The reason why I decided to hide SQL (at least for now), is because otherwise DB structure will become part of "public API". It will be harder to change it. Second reason is that I consider possibility to switch from SQLite to cozodb (graph database with datalog as query language)