simon987 / sist2

Lightning-fast file system indexer and search tool
GNU General Public License v3.0
845 stars 55 forks source link

Adding multiple tags via user scripting #419

Closed oderyn closed 11 months ago

oderyn commented 1 year ago

sist2 version: 3.2.1

Platform (Linux or Docker, x86-64 or arm64): Docker

Elasticsearch version: 7.17.9

--

I want to add multiple tags to my files through user scripts. I've been testing with the hamburger example, and cannot figure it out. I also don't any info about this in the docs. Apologies if I have overlooked it.

Here is what I've tried. Note that I did not try everything at the same time. The "--" indicates different attempts.

from sist2 import Sist2Index
import sys

index = Sist2Index(sys.argv[1])
for doc in index.document_iter():

    doc.json_data["tag"] = ["pickles.#00FF00"] #works if I only specify one tag

--
    doc.json_data["tag"] = ["pickles.#00FF00"]
    doc.json_data["tag"] = ["hamburger.#00FF00"] #Overwrites pickles
--
    doc.json_data["tag"] = ["hamburger", "pickles"] # Doesn't work; no tags are added; any existing tags are removed
--
    doc.json_data["tag"] = ["pickles.#00FF00"] 
    doc.json_data["tag"].append("hamburger.#00FF00") #does not append, either replaces or does nothing
--

    doc.json_data["tag"] = ["pickles.#00FF00"]
    index.update_document(doc)
    doc.json_data["tag"] = ["hamburger.#00FF00"] #Overwrites pickles

    index.update_document(doc)

index.sync_tag_table()
index.commit()

print("Done!")

Any advice on how to troubleshoot or pointers on what I am doing wrong would be greatly appreciated.

oderyn commented 1 year ago

Here's the latest.

It looks like I can add multiple tags, but they aren't showing up -- OR I am not doing it properly.

Here's the code that seems to be working:

from sist2 import Sist2Index
import sys

index = Sist2Index(sys.argv[1])
for doc in index.document_iter():
    # Check if 'tag' exists in the document's json_data
    if 'tag' in doc.json_data:
        # If it does, extend the list of tags
        print("Extend!")
        print(doc.json_data["tag"])
        doc.json_data["tag"].extend(["onion.#ffffff"])
    else:
        # If it doesn't, create a new list with both tags
        doc.json_data["tag"] = ["hamburger.#00FF00", "pickles.#00FF00"]
        print("Add!")    

    index.update_document(doc)

index.sync_tag_table()
index.commit()

print("Done!")

However, the tags do not display in the UI. Only in the task log:

 [ADMIN ] Starting user script with executable='/sist2-admin/scripts/test/run.sh', index_path='/sist2-admin/scan-test-2023-09-04 14:48:08.730895.sist2', extra_args=''
 [INFO ] Extend!
 [INFO ] ['hamburger.#ffFF00', 'pickles.#00FFff', 'onion.#ffffff']
 [INFO ] Extend!
 [INFO ] ['hamburger.#ffFF00', 'pickles.#00FFff', 'onion.#ffffff']
 [INFO ] Extend!
 [INFO ] ['hamburger.#ffFF00', 'pickles.#00FFff', 'onion.#ffffff']
 [INFO ] Extend!
 [INFO ] ['hamburger.#ffFF00', 'pickles.#00FFff', 'onion.#ffffff']
 [INFO ] Extend!
 [INFO ] ['hamburger.#ffFF00', 'pickles.#00FFff', 'onion.#ffffff']
 [INFO ] Extend!
 [INFO ] ['hamburger.#ffFF00', 'pickles.#00FFff', 'onion.#ffffff']
 [INFO ] Extend!
 [INFO ] ['hamburger.#ffFF00', 'pickles.#00FFff', 'onion.#ffffff']
 [INFO ] Extend!
 [INFO ] ['hamburger.#ffFF00', 'pickles.#00FFff', 'onion.#ffffff']
 [INFO ] Done!

If I view the info about an item, I get this:

Key Value
index [test]
mtime 2020-04-25
mime text/html
size 2.4k
path  
tag [ "hamburger.#00FF00" ]

Even though printing to the log in the script shows all of the values (see above).

It is also interesting to note that if I add a tag through the UI, it does not show in the list of tags that are printed to the log.

Another interesting things to note: If I remove the first item in the list using something like:

   doc.json_data["tag"].remove('hamburger.#ffFF00')

The next item in the list displays.

And if I add a tag through the UI, it will not display when I

print(doc.json_data["tag"])
oderyn commented 1 year ago

My use case might be helpful to know.

I've got a few thousand documents whose filenames follow a pattern. I want to break the filenames down into their disparate parts and use each element to create a hierarchical tagging system. Like so:

category1
    keyword1
        tag1
        tag2
        tag3
    keyword2
        tag4
        tag5
        tag6
    keyword3
        tag7
        tag8
        tag9

category2.1
    keyword2.1
        tag2.1
        tag2.2
        tag2.3
    keyword2.2
        tag2.4
        tag2.5
        tag2.6
    keyword2.3
        tag2.7
        tag2.8
        tag2.9

The "tags" are the elements in the url and I would be creating the hierarchy in the script. The script itself is working -- as far as I can tell. It is just not applying all of the tags (6 of them) that I want to pull from the filenames. Hopefully that context is helpful.

dpieski commented 1 year ago

To add to this question, do you still use periods as hierarchy separators? so "category1.keyword1.tag1" adds the 'tag1' tag?

simon987 commented 1 year ago

Thanks @oderyn, it might be a bug in the 3.2.x code, as i said it's not really been tested thoroughly yet.

in theory this should work:

doc.json_data["tag"] = ["hamburger.#00FF00", "pickles.#00FF00"]
simon987 commented 1 year ago

To add to this question, do you still use periods as hierarchy separators? so "category1.keyword1.tag1" adds the 'tag1' tag?

yes

oderyn commented 1 year ago

Thanks @oderyn, it might be a bug in the 3.2.x code, as i said it's not really been tested thoroughly yet.

Cool. Happy to be a guinea pig. :)

in theory this should work:

doc.json_data["tag"] = ["hamburger.#00FF00", "pickles.#00FF00"]

I just tested again. The first value is added to the array and displays in the UI. The second value is also added to the json, but not showing up in the UI.

Also, tags added through the UI are not showing up in the json (based on the readout in the task console when I print the doc.json_data["tag"]).

Lastly, it appears that the above command will replace all the items in the tag array. Not sure if that is intentional or not -- or just part of how Python works.

I hope this was helpful.

simon987 commented 11 months ago

Should be fixed in the latest docker tag!