nextcloud / fulltextsearch_elasticsearch

πŸ” Use Elasticsearch to index the content of your Nextcloud
https://apps.nextcloud.com/apps/fulltextsearch_elasticsearch
GNU Affero General Public License v3.0
81 stars 30 forks source link

Add support for searching in external files (fixing #89) #100

Closed R0Wi closed 1 year ago

R0Wi commented 4 years ago

This fixes #89. It is implemented like proposed here: https://github.com/nextcloud/fulltextsearch/issues/546#issuecomment-583791976

TheR00st3r commented 4 years ago

Are there plans when this PR will be considered and released?

R0Wi commented 4 years ago

@4ipalino does this fix the issue for you?

TheR00st3r commented 4 years ago

@R0Wi I tested it again and it worked on my system. As long as the external Drive is not used as Rootdirectory with foldername "/".

ItΒ΄s important to set an explicit foldername for the external share: image

R0Wi commented 4 years ago

Thanks for the feedback. Thats a case i didn't test so i'll have a look at this. Maybe i can implement a solution for that, too.

R0Wi commented 4 years ago

@4ipalino i modified the patch so now a external share mounted as root-directory should also work. If that's the case, the check of the filepath is now omitted because logically seen every "external file" is allowed to appear in the search result.

Maybe you would like to check that out if you've some time left and give me feedback on this :smile:

trendzetter commented 4 years ago

I can confirm the fix is working. The code looks fine but probably my opinion is is not enough as a review since I am not a nextcloud developer.

R0Wi commented 4 years ago

@daita @MorrisJobke or maybe @skjnldsv any chance to get some feedback on this, please?

skjnldsv commented 4 years ago

@daita @MorrisJobke or maybe @skjnldsv any chance to get some feedback on this, please?

Sorry, this is out of my expertise! :/

trendzetter commented 4 years ago

WTF!? This is a fix for a major blocking bug in one of the most important features to make this enterprise ready. Why is no-one of the nextcloud willing to review your code??

ArtificialOwl commented 4 years ago

Hello, main dev on fts here.

I was supposed to work on the app last week but got overwhelmed by other more important thing.

R0Wi commented 4 years ago

Hello, main dev on fts here.

I was supposed to work on the app last week but got overwhelmed by other more important thing.

No problem for me. I already did a few minor improvements for some customers who use the FTS so maybe it would be interesting for you if i can support you somehow. Just let me know :-)

trendzetter commented 4 years ago

Sorry for my emotional reaction. I am looking into the contribution guidelines, see if I can make more of a contribution instead of pointing out issues.

trendzetter commented 4 years ago

I am still figuring out why this happens, not sure if it has anything to do with this fix but I am not seeing all of the results found. I often see only one or 2 or even no results listed. I just know results are found because it displays on top of the page:"de zoekopdracht in Files naar 'bunny' leverde 8 resultaten op in 51 ms" (8 results found in Dutch)

R0Wi commented 4 years ago

I am still figuring out why this happens, not sure if it has anything to do with this fix but I am not seeing all of the results found. I often see only one or 2 or even no results listed. I just know results are found because it displays on top of the page:"de zoekopdracht in Files naar 'bunny' leverde 8 resultaten op in 51 ms" (8 results found in Dutch)

I think this has nothing to do with this change i should rather be a frontend problem (this fix just extends the backend search mechanism). Maybe you could take a look at the dev-console network tab and see the JSON response returned by the server? If the result rows are missing it could be something on the backend otherwise i'd say there's something going on in the frontend javascript code or something. If you like you can create a issue here and i'll see if i can help you somehow (just to keep this discussion clean here)

Jake-Etherflow commented 4 years ago

Works perfectly! Thank you very much. I was able to mount nfs and then map nextcloud local storage to it with no further issues displaying results for the index.

ArtificialOwl commented 4 years ago

Hello,

There is no reason for this app to search/access remote files. Can you try to migrate your work onto files_fulltextsearch ?

R0Wi commented 4 years ago

Well currently the issue is that external files are properly indexed (with the help of files_fulltextsearch). But the results (indexed files) are then filtered by ownership when creating a search request inside this app. Because external files do not have a special owner they're filtered.

My current approach is to get all external mounts accessible to the current user when he uses the search functionallity. This info can then be integrated into the filter part of the ES query.

I know that this somehow mixes up the separation between files_fulltextsearch and fulltextsearch_elasticsearch but i did not find another way to implement the mentioned feature. Maybe you could give a draft on how to provide this inside files_fulltextsearch? I'd then really be looking forward implementing this :-)

omidmeh commented 4 years ago

I'm also very interested in this.

In the meantime, I tried just downloading the files that you've changed and replacing the ones from the original plugin installation with them. However, it looks like when I do that the search no longer works!

Is there any other steps I need to take other than replacing the old files with these in the installation folder?

Jake-Etherflow commented 4 years ago

I don't remember where the other patch was, but I had to combine the two in the same file to get search working and external files as well. If you'd like a copy of it, shoot me a message, and I can send it to you.

EDIT: Pretty sure #107 was it.

R0Wi commented 4 years ago

@omidmeh i think what you could do is cloning this repo, applying the patch from this PR and installing the apps dependencies manually (note you need to install git as well as composer). So basically you could do something like this:

cd <NEXTCLOUD_ROOT>/apps
rm -rf fulltextsearch_elasticsearch
mkdir fulltextsearch_elasticsearch && cd fulltextsearch_elasticsearch
git clone https://github.com/nextcloud/fulltextsearch_elasticsearch.git .
git checkout v1.5.2
curl https://patch-diff.githubusercontent.com/raw/nextcloud/fulltextsearch_elasticsearch/pull/100.patch | git apply -v
composer install
cd .. && chown -R www-data:www-data fulltextsearch_elasticsearch

Let me know if this helps.

EDIT: note that v1.5.2 should already contain the fix regarding ES7.7 support mentioned by @Jake-Etherflow so the above should work

omidmeh commented 4 years ago

This fixed my issue as well. Thank you!!

omidmeh commented 4 years ago

The only thing that I noticed is that when I search for something that belongs to a different user, although I don't actually see the result, it does show "the search in Files for 'search-term' returned 1 results in 56 ms".

That's still a huge improvement since the owner user sees the file but the non-owners don't. Would be nice if we could adjust that number as well though!

R0Wi commented 4 years ago

Could be related to https://github.com/R0Wi/fulltextsearch_elasticsearch/issues/1

ravermeister commented 4 years ago

sigh, I really hope this fix find the way to the master branch (and the next Version). otherwise the whole thing is nearly useless, even worse, It gives the User the wrong feeling about the content of his own data!.

lhurt commented 3 years ago

Why is it not merged to the main branch?

Apfelwurm commented 3 years ago

Hi, can please someone of the staff review this PR? It works perfectly and its a pain to implement this fix for every release manually. Thanks!

trendzetter commented 3 years ago

I think someone is waiting for your good consultancy fees.

R0Wi commented 3 years ago

Like mentioned in https://github.com/nextcloud/fulltextsearch_elasticsearch/pull/100#issuecomment-668095902 this PR does not respect the separation of concerns between the FTS apps (even it works...). Unfortunately currently i don't have much time to port my work onto files_fulltextsearch because i think this would be a bit more heavy lifting.

tauceti82 commented 3 years ago

@omidmeh i think what you could do is cloning this repo, applying the patch from this PR and installing the apps dependencies manually (note you need to install git as well as composer). So basically you could do something like this:

cd <NEXTCLOUD_ROOT>/apps
rm -rf fulltextsearch_elasticsearch
mkdir fulltextsearch_elasticsearch && cd fulltextsearch_elasticsearch
git clone https://github.com/nextcloud/fulltextsearch_elasticsearch.git .
git checkout v1.5.2
curl https://patch-diff.githubusercontent.com/raw/nextcloud/fulltextsearch_elasticsearch/pull/100.patch | git apply -v
composer install
cd .. && chown -R www-data:www-data fulltextsearch_elasticsearch

Let me know if this helps.

EDIT: note that v1.5.2 should already contain the fix regarding ES7.7 support mentioned by @Jake-Etherflow so the above should work

the 100.patch was adapted so it couldn't patch the Makefile as the version+=1.5.2 was changed to 20.0.0 so it could not find the correct positions to patch...what helped was to manually change version+=1.5.2 to version+=20.0.0 before applying the patch...

And then do in the end occ upgrade otherwise it will disable the plugin.

it cost me hours to get the search working on external files! This has to be fixed asap!

R0Wi commented 3 years ago

@tauceti82 sorry for the circumstances i rebased this PR onto the current v21.0.0 so you should now be able to apply the patch again by git checkout v21.0.0 and then following the commands from above. Maybe i manage to build a regularly updated custom .tar file via Github Actions and link it here these days.

trendzetter commented 2 years ago

@ArtificialOwl @MorrisJobke Why is this not getting any review or any kind of feedback? Is nextcloud sectarian? This makes me think twice before proposing Nextcloud to anyone.

MorrisJobke commented 2 years ago

I left the company and only have very little time in my spare time. Sorry

Apfelwurm commented 2 years ago

Any chance this problem can get revisited?

R0Wi commented 2 years ago

ping @ArtificialOwl

trendzetter commented 2 years ago

@adsworth ?

adsworth commented 2 years ago

@adsworth ?

I've only worked on the app store and that was three years ago or so. I'm not an active Nextcloud member/ceveloepr.

ArtificialOwl commented 2 years ago

it is possible that https://github.com/nextcloud/files_fulltextsearch/pull/170 fixes this issue. this needs confirmation

R0Wi commented 2 years ago

@ArtificialOwl thank's but if i understand the code correctly, this most likely doesn't solve the problem for external files. It's already a long time ago i dealt with the code but if i remember correctly, the main problem was the search filter condition in https://github.com/nextcloud/fulltextsearch_elasticsearch/pull/100/files#diff-b55ac6e506ccacc79909f6e0bb3d312f92a4d33e089c0d952fe032fdfe60fb71R378 which currently does not allow search results to have an empty owner when searching inside the index.

Nevertheless i'm going to test the patch these days and give you some feedback on this πŸ‘

lhurt commented 2 years ago

@ArtificialOwl thank's but if i understand the code correctly, this most likely doesn't solve the problem for external files. It's already a long time ago i dealt with the code but if i remember correctly, the main problem was the search filter condition in https://github.com/nextcloud/fulltextsearch_elasticsearch/pull/100/files#diff-b55ac6e506ccacc79909f6e0bb3d312f92a4d33e089c0d952fe032fdfe60fb71R378 which currently does not allow search results to have an empty owner when searching inside the index.

Nevertheless i'm going to test the patch these days and give you some feedback on this πŸ‘

Hi, is there any noteworthy progress on this issue?

trendzetter commented 2 years ago

Hi, is there any noteworthy progress on this issue?

It looks promising and it's merged. I haven't tested myself but I suppose the best thing to do is try install a vanilla build and see if searching works now on external files

trendzetter commented 1 year ago

I managed to test this scenario again and I can confirm searching using the search box works for external files now. Unless someone else thinks this needs to be reopened I think it's probably OK to close. @lhurt @ArtificialOwl

R0Wi commented 1 year ago

Also tested it on a fresh NC25 instance with one and two users and a samba share. Seems to work now because there was a change how files are indexed. Don't know when this changes came into the app but with app version 24 searching in external files seems to be fixed. Thank's @ArtificialOwl or who else fixed this :+1:

lhurt commented 1 year ago

Unfortunately it doesn't work for me.

I have some Samba shares, where each user has to provide its password in the configuration.

My test included the following steps

  1. Recreate index for all users from scratch
  2. Show live indexing (occ fulltextsearch:live)
  3. Copy a file on an indexed directory using linux bash
  4. Copy a file via Windows Explorer in the very same directory
  5. No indexing is displayed in the live search
  6. All previously existing files can be found using nextcloud search, but not the copied ones

Any idea what could be wrong?

R0Wi commented 1 year ago

@lhurt since the fulltextsearch app heavy relies on the file-system events of NC (like for example when a new file is detected by Nextcloud), I think you're hitting the problem, that files which are directly added to the samba share do not trigger the NC events. See official docs. If indexing works when uploading the file directly to NC (for example via WebUI), then this is most likely not a problem with the fulltextsearch app itself.

lhurt commented 1 year ago

Thanks for the explanation. Wouldn't https://www.php.net/manual/en/book.inotify.php help to improve the situation. I'm not sure if this is the correct place to ask. Please forgive me if I'm completely off track. I've no PHP development skills.