logseq / logseq

A privacy-first, open-source platform for knowledge management and collaboration. Download link: http://github.com/logseq/logseq/releases. roadmap: http://trello.com/b/8txSM12G/roadmap
https://logseq.com
GNU Affero General Public License v3.0
30.26k stars 1.77k forks source link

Renaming a page doesn't update queries using the page #7519

Open LeuschkeTressa opened 1 year ago

LeuschkeTressa commented 1 year ago

I've seen this for some time, and it's quite a hindrance.

Steps to reproduce:

Expected result:

Actual result:

Screenshot: a link has been updated accordingly, but the query has not: CleanShot 2022-11-29 at 21 37 44

Note: refresh or re-index has no effect, the problem still persists.

Issue https://github.com/logseq/logseq/issues/2397 is similar, but was reported for a case involving templates and cyrillic letters in the page name.


LogSeq Version 0.8.11 (0.8.11) MacOS Darwin Kernel Version 21.6.0

cnrpman commented 1 year ago

I don't think this feature is implemented in previous version. So mark this as feature request

Bad3r commented 1 year ago

Thank you for the feature request. Please post feature requests to the forum for users to discuss and vote on the issue. PRs are welcome.
https://discuss.logseq.com/c/feature-requests

LeuschkeTressa commented 1 year ago

I can't for the world understand why this would be a feature request and not a bug.

When renaming a page, all links to that page, on all pages, are re-written and updated to the new name by LogSeq. Except for within queries...?

initial steps

on some page: * link: [[somePage]], query: {{query [[somePage]]}}

the user renames somePage to somePageRENAMED

curren behaviour, and correct behaviour according to you

LogSeq will re-write the page contents to: * link: [[somePageRENAMED]], query: {{query [[somePage]]}}

A necessary consequence is that any queries using the page will stop working.

expected behaviour according to me

LogSeq will re-write the page contents to: * link: [[somePageRENAMED]], query: {{query [[somePageRENAMED]]}}


So

Your actual optinion is, really, that this is the expected behaviour?

This bug was initially reported by someone else on the forum. It has been upvoted by a few people. The bug affects me as well. I took the time to produce a minimal example and write a careful bug report. You now say this is a "feature request", close the issue, and ask me to discuss this with other users again on the forum? A forum that, further, sees very little interaction from anyone within the LogSeq team?

Sorry, but what is this?

(I don't think I'm the only one. Another user, not among those already mentioned, put a sad-face on your comment above about not considering it a bug: https://github.com/logseq/logseq/issues/7519#issuecomment-1364433455. And yet someone else re-opened the issue: https://github.com/logseq/logseq/issues/8391.)

cnrpman commented 1 year ago

@LeuschkeTressa Sorry! We know that it's very useful, but it's just too hard to implement. Code contribution is welcome if anyone want to give a try.

I may reopen the ticket for backlogging.

fschuman commented 1 year ago

@cnrpman I'm sad to hear this will only go into a backlogging.

Unless I'm missing something, this behaviour makes the combination of queries and renaming page names unreliable ALWAYS AND IN ALL CASES. EVERY single renaming of a page will break ALL queries that it is part of !!!

It's strange if one has to decide between using functions as fundamental to Logseq as queries and renaming pages. The conclusion would be (1) either never rename pages, or (2) never use queries. (I'm assuming here that manually tracking if a page was used in a query is unfeasible).

I would consider that a major bug, given what Logseq is ??

cnrpman commented 1 year ago

@fschuman Sorry, but it's about implementation difficulty. It's hard to manipulate DSL written by users. But code contribution is welcome if you have idea about how to implement this feature.

I hope feature request is fair for this ticket. Further discussion on how to realize this feature is appreciated.

Or you may suggest some similar projects with query DSL manipulation (on renaming etc.) implemented, then we can investigate the method to achieve it.

LeuschkeTressa commented 1 year ago

@fschuman Sorry, but it's about implementation difficulty. It's hard to manipulate DSL written by users. But code contribution is welcome if you have idea about how to implement this feature.

I hope feature request is fair for this ticket.

How can this "being hard to implement" be a reason to mark it as a feature request, when it is clearly a bug?

If anything, it should be the opposite: When choosing what issues to work on, I guess you'd prioritize limiting bugs higher than new features. So then mark it as the bug that it is, instead of giving it a "feature request" status, along suggestions on including a video editor in LogSeq or whatever might be next in line in terms of features.

There are now a number of user engaged in this bug. None seem to be able to understand what the LogSeq teams reasoning on this is - and frustration is growing. In that situation: could you at least motivate why you don't consider this a bug? Could you please comment on @fschuman s very clear argumentation above? When there is user engagement around this, it's a bit strange not to provide any arguments on this.

Forum discussion about this issue: https://discuss.logseq.com/t/rename-query-fields-when-renaming-a-page-title/8080/9

Another related discussion: https://discuss.logseq.com/t/opinion-development-focus-should-shift-towards-improving-logseqs-partially-unstable-core/17599/3 (I actually wrote that post a few hours before seeing this issue had been closed, so it wasn't inspired by what happend here.)


In the larger picture, I thinkLogSeq has fallen into a trap, that many software projects have seen (not with a nice ending): Core functionality is lacking, therefore the part of the user base consisting of dedicated long-term users is small, the user turn-over is large with many new users coming in but leaving after a short time. Creating new shiny features does bring on new users - but again, they don't stay long. Large user turn over -> many new registered users and Github stars are rising, which is interpreted as a positive sign, when it is actually a sign of a problem. The team thinks all is well continues creating the new shiny features and ignoring the basic functionality and usability. It's an evil spiral pointing upwards, but it consists of nothing - it's like an inflation bubble.

I have seen too many software projects failing because of this effect. It would be a shame if a tool with the potential of LogSeq took this route.

Which needs to be fighted somewhat similar to inflation: don't keep inflating, instead go back to the basic mechanism of the economy/software and do the hard choices and the dirty groundwork to make an actual, stable improvement. The fundamental, unsexy, stabilizing groundworks.

cnrpman commented 1 year ago

@LeuschkeTressa Thanks for sharing your visions. Yes we always put core function maintenance & bug fix over new feature implementation. But we still don't have any idea of implementing this Renaming a page update queries using the page without the big engineering effort & challenge (risk) I think this is more an engineering problem than the "I want this feature / I think it's a bug". There are tons of "enhancement points", but the actual delivery is limited by the engineering difficulty fact.

I may change the label if you insist. But it's likely a bug that we have no resource to fix. As an open-source software, we really hope the power from the community can help us on making this done.

Some simple idea on how to implement:

  1. Track all path reference on every query
    • Need a mechanism for collecting & maintaining query meta data?
    • Need to be responsive / or, can parse all DSLs at low cost, real-time, while supporting any possible DSL provided by users
  2. On page rename, finds out these affected query DSLs
  3. Apply the renaming of page reference to the affected query DSLs
    • Sounds like something editing the DSL AST in-place, then putting the query back. Would be some engineering challenges.
fschuman commented 1 year ago

@cnrpman

Sorry, but it's about implementation difficulty. It's hard to manipulate DSL written by users. But code contribution is welcome if you have idea about how to implement this feature.

Oh I see. While I do have a technical background, unfortunately I'm no coder for such things and probably can't see the complexity involved.

What I'm wondering: if the links to a page get updated correctly when a page is renamed, it should be possible to update the queries a page is used in as well, no ? Why is this harder for queries than for links ?

I hope feature request is fair for this ticket. Further discussion on how to realize this feature is appreciated.

I wouldn't find it useful to spend time on how this ticket is categorised. But I would want to give you the feedback that conceptually, for me, this not a feature request, but a very major bug or design flaw, or perhaps even a deal breaker. It is high risk to the point of useless spending time and energy to set up query systems if the foundational operation of renaming a page will consistently break them. I have so far only used very simple queries. And now I do not see a point in building complex queries if I have to 'debug' broken queries frequently because I renamed pages.

It might be very useful to mention this (desired or unintended) interaction between queries and page renaming clearly in the documentation.

EDIT: I realise this may sound a bit negative, it is not meant that way! I appreciate Logseq very much, the concept and the implementation are amazing in many ways. I do think this behaviour is a major design flaw for what Logseq is. But this is meant as honest feedback, because without honest feedback, I feel, developers can not make good decisions.

cnrpman commented 1 year ago

@fschuman basically the query is too highly customizable, with a big syntax set (so it's called DSL, domain specific language), while links only have limited forms ([[]], (()), just found renaming for []() isn't supported)

If we are not pursuing 100% case coverage (just like what happening on the updating on links), I'd like to suggest trying a text-based solution by matching the [[]] patterns in query, which can likely solve the most common query use cases. It sounds like 50% easier than the AST-based solution mentioned in this comment, especially would have a much easier step 3 (Apply the renaming of page reference)

I realise this may sound a bit negative, it is not meant that way!

NVM :) So we are looking for community help given we have limited develop resource.

I feel the text-based solution is more achievable. Some contributors may be interested in, or the team can pick this up when have extra bandwidth.

fschuman commented 1 year ago

basically the query is too highly customizable, with a big syntax set (so it's called DSL, domain specific language),

Does it matter how complex the query is ? I don't know how pages and queries are stored. But assuming that in the graph, a page knows (or could be made to know) which queries it is part of, then merely replacing the old page name with the new page name in each query would be sufficient, no ? Why does the replacement operation need to consider the query syntax ?

If we are not pursuing 100% case coverage (just like what happening on the updating on links),

Wait - renaming pages does or doesn't guarantee updating 100% of the links to the page ?

cnrpman commented 1 year ago

@fschuman For example it's not updating those link in shape of [link name](page-link) via my testing. Working as expected for [link name]([[page-link]]) though. This is the dilemma of file-based storage - tons of corner cases. It will be worse for query, as it's much more flexible.

And this is why we are working on a DB-based storage - it's for the users looking for great data structure integrity over the benefit of storing pages as plain text files.


Update on May 14. Well, seems [link name](page-link) is just an invalid, out-of-doc usage. So the link update is still having 100% coverage in my knowledge

Afaik we support the format [name]([[page name]]) and not [name](page name) - https://docs.logseq.com/#/page/term%2Fpage%20reference%20with%20label . I'm guessing the syntax you're describing was working accidentally

https://github.com/logseq/logseq/issues/9362#issuecomment-1544386619

cnrpman commented 1 year ago

Codebase reference for who with interest: the entry for updating logic on renaming pages: https://github.com/logseq/logseq/blob/60fbfdf2f7756ac827ed08d9e495e35a7dd472a3/src/main/frontend/handler/page.cljs#L280-L287

fschuman commented 1 year ago

@fschuman For example it's not updating those link in shape of [link name](page-link) via my testing. Working as expected for [link name]([[page-link]]) though. This is the dilemma of file-based storage - tons of corner cases. It will be worse for query, as it's much more flexible.

Still don't understand: page-link is an atomar unit, no ? Why does replacing it require differentiation of the embedding context ? Both of your examples are strings within which the page-link string is embedded. Replacing 'page-link' with 'page-link_new' should work in all cases. Same for queries. Am I not seeing something ?

cnrpman commented 1 year ago

@fschuman For the []() case, it may contains external link (http / file) instead of page link. It's different from the []([[]]) case. Doing plain text replacing maybe dangerous. NVM, it's just an example about the corner cases to handle for text-based storage.

fschuman commented 1 year ago

For the []() case, it may contains external link (http / file) instead of page link. It's different from the []([[]]) case. Doing plain text replacing maybe dangerous. NVM, it's just an example about the corner cases to handle for text-based storage.

I see. This would destroy http / file links that happen to entail string that is the same as the page link.

Is this all happening dynamically all the time ? Is there not a graph the stores links to pages ? Could that graph not store queries using pages as well ? If a page is used at a query creation, that could be stored with the page. At that moment, it's clear if the link is a page name or a http / file name, no ?

cnrpman commented 1 year ago

@fschuman Sounds like a plausible solution. May be an alternative to doing AST parsing over text by doing higher level meta data caching. I'm happy to assign this ticket if anyone is going to make a try.

cnrpman commented 1 year ago

I'd like to use this ticket to track requests related to the extra (not implemented) ref updates on page renaming Basically it's hard to make the renaming perfect for a text-based storage.

Code contribution is welcome.

MultiCoreNop commented 10 months ago

@cnrpman First of all: thanks for your time and effort! I've stumbled upon Logseq only a couple of days ago and therefore am still in the learning phase (also: I don't speak Clojure). Searching due to some glitches I've found brought me here.

As mentioned, I do not know Clojure, so I did not have a look at the code. Yet, the following statement confuses me:

@fschuman basically the query is too highly customizable, with a big syntax set (so it's called DSL, domain specific language), while links only have limited forms ([[]], (()), just found renaming for []() isn't supported)

I mean, Logseq has a query language (obviously :)). So it can parse it - otherwise no results.. If it can parse its query language, it also has the abstract syntax tree of every query readily available. Therefore it knows all the tokens and also the token types. Why would it then be hard to just check all "name/ref"-type tokens for "is a link to this page" and rename them accordingly?

Again, thanks for your time. Logseq seems like a very impressive product so far!

cnrpman commented 10 months ago

@MultiCoreNop Hi

If it can parse its query language, it also has the abstract syntax tree of every query readily available

Yes, it's correct. But it doesn't mean it's effort-free. Parsing is just one step, then there are a number of hooks to handle. We don't have extra band-width on this yet. I hope this reply is some how informative: https://github.com/logseq/logseq/issues/7519#issuecomment-1539291070

To make the current query model to be "friendly to renaming reference", major refactoring on decoupling the "symbol" with the "ref name" is required.

Thanks for your love to our product :) But each "minor" feature could requires huge huge underlying engineering effort.

fschuman commented 10 months ago

But each "minor" feature could requires huge huge underlying engineering effort.

I would also first give my appreciation to LogSeq again - an amazing product I'm grateful for being able to use.

And I would once more, if you allow me, question the judgement of this as 'minor'. I suspect this is probably not what you literally mean, but I'd want to say again: doesn't this behaviour necessitate a choice between either not renaming pages or not using queries ? That seem two very basic operations that shouldn't conflict with each other.

Or, may I ask what would be your best practice advice on this ? If I rename a page, an unknown number of queries at unknown locations WILL break. This means I will stumble on queries in my graph that stoped working. But I won't know why, unless I am able to trace it back to the renaming of a page that was part of the query. Yet even then, I may not necessarily remember the new name of the renamed page, so then the query can't be 'fixed'. How could I handle this ?

MultiCoreNop commented 10 months ago

@cnrpman Thanks for the response!

To make the current query model to be "friendly to renaming reference", major refactoring on decoupling the "symbol" with the "ref name" is required.

May I then propose a [hopefully] easy-to-implement "workaround"?

If renaming "Page A" to "Page B" would gather all queries that contain the string "Page A" (so false-positives possible), all those queries could be tagged "#maybe-broken-query". One would then have a page "maybe-broken-query" with all references to all queries that might be broken.

As said, false-positives may arise, but false-negatives shouldn't be possible (except if your query syntax allows for name-generation of filters, then false-negatives would only be unlikely). Under the assumption that people do not have a ton of mega-advanced queries for every single page, this might solve 99,9% of broken queries that might otherwise go unnoticed.

This would boil "renaming a page" down to

To make users aware of the fact that they have to check this maybe-broken-query page, you could also insert a block "This page was renamed from "Page A" to "Page B" which might have #maybe-broken-query" as the 2nd first block of the page (so to not mess with alias/properties).

cnrpman commented 10 months ago

@fschuman It's the nature disadvantage of file-based storage. I can't see any similar products provide such a feature. It's just very hard to get two worlds combined.

This is why we are investing a big part of our engineering effort on the DB-storage currently. Our target is having full data model integrity while maintaining most of the open format & local-first benefits.

fschuman commented 10 months ago

@fschuman It's the nature disadvantage of file-based storage. I can't see any similar products provide such a feature. It's just very hard to get two worlds combined.

I still don't fully understand I think. Reindexing touches every link in every file to rebuild a graph with bidirectional links. Why can it not also touch every page reference in every query and save it with the referenced page just like a bidirectional link, and then keep this updated when queries are created or changed and when pages are renamed ?

This is why we are investing a big part of our engineering effort on the DB-storage currently. Our target is having full data model integrity while maintaining most of the open format & local-first benefits.

Nice !

MultiCoreNop commented 10 months ago

This is why we are investing a big part of our engineering effort on the DB-storage currently. Our target is having full data model integrity while maintaining most of the open format & local-first benefits.

Nice !

Well.. I chose Logseq because of Markdown files as underlying data structure. The last thing I'd want is yet another SQL-hard-to-get-my-data format. I thought that was kinda Logseq's main feature: access to your data with "ls' and 'cat', yet a powerful GUI to access it.

Would be very bad if that changed.. As an option, ok (if that would better the performance for some), but I hope, that the Markdown-"backend" will remain at feature-parity.

fschuman commented 5 months ago

Just to voice that a Markdown files based structure was my reason to chose Logseq as well.

So by saying 'nice', I was assuming feature-parity with the Markdown version would remain.