Closed jvillafanez closed 1 year ago
@jvillafanez great 👍 pls ping me again when close to merge so I can prepare the docs part.
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
134 Code Smells
these features seem to be too good to not merge them... @hodyroff I was just cleaning up and updating projects, when I found this change, which we apparently never merged. I do think our customers would benefit - maybe this is something for the 10.13?
@jvillafanez occ search:index:fillSecondary RelevanceV2 <user>
with this PR, are there also core change(s) needed?
No core changes are needed. Everything is part of the app.
No core changes are needed. Everything is part of the app.
In this case, just to be noted, an app upgrade could be made to be regulary downloaded from the marketplace making the change availabe asap plus adding it to the default for the 10.13 release. There is no doc restriction going that path.
@jnweiger could we do some QA here?
could we do some QA here?
@jnweiger could we do some QA here?
Please review and merge into release-2.4.0 branch, I''ll build a 2.4.0-rc.1 from there and then start QA.
I don't see a release-2.4.0
branch, so I guess it will be created after this PR is merged.
It wasn't pushed. Sorry. Pushed now, and retargetted this PR.
For testing: search_elastic-2.3.0+refactor_connector.tar.gz
Screenshot for @mmattel Admin -> Settings -> Search
Latest commit has changed the behavior of the search. The initial post has been updated to reflect those changes.
I'd suggest to make all new keywords upper case. SIZE, EXT, MTIME, TYPE, MIME
This will need some voting... I'd rather keep them lowercase :smile:
size.b and size.mb is supported. size.kb is missing.
We'd need to send that info because as far as I know, elasticsearch doesn't make any calculations (or it's complex to setup). As far as elasticsearch is concerned, you can send size.b = 50
and size.mb = 70
and those values will be stored so that's a potential problem we have. That's why I'd rather store the minimum information possible. Furthermore, the size.gb
is also missing for the same reason.
Migration step 6 is unclear: "Then you can completely remove the index from elasticsearch." -> search:index:reset ? or some 'drop table' SQL in the elastic server?
It's mainly for elasticsearch maintenance, to remove data that won't be used any longer. It's possible to skip that step without any problem on our side, but you have to live with junk data in elasticsearch.
Migration step 6 is unclear: "Then you can completely remove the index from elasticsearch." -> search:index:reset ? or some 'drop table' SQL in the elastic server?
It's mainly for elasticsearch maintenance, to remove data that won't be used any longer. It's possible to skip that step without any problem on our side, but you have to live with junk data in elasticsearch.
Understood. Instructions, how to exactly remove this junk data is missing.
Most of the problems are fixed in the current state of #319 but I believe there is one regression now https://github.com/owncloud/search_elastic/issues/331#issuecomment-1654061258
Kudos, SonarCloud Quality Gate passed!
New things coming in this PR:
About the new "RelevanceV2" connector:
Note that, while the modification time affects the scoring, it doesn't mean that recent files will always be the first ones to appear. Old files might still have a higher score even after those boosts.
The "RelevanceV2" connector also introduces new ways to search for files based on the indexed fields. Note that the following info only applies to the "RelevanceV2" connector.
By default, the "RelevanceV2" connector will search in the name field, and in the file content if possible. Old limitations for the app to index the file content are still in place. Also note that, due to those limitations, big files might not get its contents indexed.
Additional searches you can do with the "RelevanceV2" connector:
ext:pdf
,ext:docx
,ext:gif
,ext:mp4
,ext:tar.gz
,ext:gz
, etc, any extension is possiblesize.b:<8092
,size.b:>102400
,size.b:[8092 TO 16184]
size.mb:<3
,size.mb:>9
,size.mb:[3 TO 9]
type:file
,type:folder
mtime:<1678960862
,mtime:>1678960862
,mtime:[1608111372 TO 1678960862]
mtime:<2021-08-25
,mtime:>2023-01-18
,mtime:[2022-01-01 TO 2022-12-31]
mime:image
,mime:gif
,mime:text
NOTE: To search for the whole mimetype such as "image/gif" usemime.key:image\/gif
~By default, each search term will be joined with an "OR" operator. For example
brown ext:pdf
will be interpreted as "name or content containing brown OR extension = pdf", so "brown.txt" file and "tito.pdf" will appear in the results. You can usebrown AND ext:pdf
to match pdf files containing brown in the name or contents.~ (It doesn't works like this any longer) Each search term will narrow the search. For examplebrown ext:pdf
will be interpreted as "name or content containing brown AND extension = pdf", so "brown.pdf" and "a brown paper.pdf" will appear, but not "brown.txt" or "tito.pdf"Some example of complex searches:
confidential mtime:>2023-01-01 size.mb:<10
type:folder size.mb:>1024
mime:image mtime:[2020-03-01 TO 2020-06-30]
(oxygen OR helium) AND (ext:pdf OR ext:txt)
~ (no lnoger applies)Note that matching by name is pretty lax, so expect a bunch of unexpected results. Anyway, good results are expected to be on top.
Migrating to the "RelevanceV2" connector:
If you haven't indexed anything yet, you're encouraged to setup the connectors you want to use as part of the app configuration. The recommended one is "RelevanceV2" for write and search.
If you have indexed data, these are the steps to migrate to the new index.
occ search:index:fillSecondary RelevanceV2 <user>
command. The command needs to be run for all the users (or at least the ones using the search feature), and it's expected to take a lot of time.With step 2 you'll be writing in both indexes at the same time. This is expected to be slower. Note that step 2 just takes care of new files. Files indexed previously won't be present in the new index. This is why step 3 is there. Step 4 is important and you should stop at that point for a while. If something goes wrong, you can still revert things, in particular, you can switch back to the "Legacy" connector. From step 5 the actions are irreversible. If you want to go back, you'll have to start a new migration.
It's important to notice there isn't any expected downtime while the migration happens. Until step 4, the "Legacy" connector will keep updating the index normally. When the switch happens in the search connector, the new "RelevanceV2" connector will access to the new index, which should have been fully updated.
NOTE: This might not be the final version. This is mostly the state of the PR and things might change without notice. The official documentation is expected to contain the final information once this PR is completely finished. @mmattel FYI this will need documentation.