google / timesketch

Collaborative forensic timeline analysis
Apache License 2.0
2.52k stars 577 forks source link

Timesketch auto-renames timeline uploads with the same timeline name #3052

Open puffyCid opened 3 months ago

puffyCid commented 3 months ago

Describe the bug When uploading mutliple timelines with the same name to a sketch, Timesketch renames the timeline name by appending 4 random characters. When uploading timelines with the same name, i was assuming the additional uploads with just get added to the timeline data source instead of a new name

To Reproduce This seems to happen mainly when using async uploads. However, i was able to trigger using sync uploads by running the upload multiple times I was able to trigger it using the python code below (replacing with Timesketch instance, and with home directory I also had to spawn it several times to "mimic" async a bit

from timesketch_api_client import client as timesketch_client
from timesketch_import_client import importer
import os

def list_files(path, client, sketch):
    files = os.scandir(path)

    for file in files:
        entry = {"message": file.name, "datetime":"1970-01-01T00:00:00.000Z", "timestamp_desc": "test"}
        with importer.ImportStreamer() as streamer:
            streamer.set_sketch(sketch)
            streamer.set_timeline_name('uploadA')
            streamer.add_dict(entry)

        if file.is_dir():
            print(path+"/"+file.name)
            list_files(path+"/"+file.name, client, sketch)

def main():
    client = timesketch_client.TimesketchApi(host_uri='<IP>', username='sketchy', password='password')
    sketch = client.get_sketch(6)
    start = "<home path>"
    entries = []
    list_files(start, client, sketch)

if __name__ == "__main__":
    main()

its alot easier to trigger with async code is seems

Expected behavior Timesketch continues to upload data to the same Timeline name and increment the data source

Screenshots Image below is from the python code. I had to spawn multiple instances to trigger the rename

python

Image below is from async code i was using (mixture of TypeScript and Rust)

async

Desktop (please complete the following information): Running Timesketch on Ubuntu 22.04 VM

Additional context I brought this up in Timesketch Slack channel, and it was mentioned to open a Github issue.

Could this possibly happen because mutliple uploads are being submitted at once (or too quickly) and there is some kind of brief lock on the timeline name and when the second upload occurs the lock triggers timesketch/opensearch to rename it?

Let me know if additional info is required Thanks

justzh commented 3 months ago

Time travel will not be legal until January 1, 2048.

justzh commented 3 months ago

Feel free to reply here.

justzh commented 3 months ago

This is a valid issue, but see the latest issue for this project.

justzh commented 3 months ago

Do not close this issue

justzh commented 3 months ago

The problem seems to be with Figma.

justzh commented 3 months ago

OH. You need to read a book called Clean Code.

mbartle-sf commented 2 months ago

I'm having this issue, too. It occurs whenever the client makes a request to upload data to a sketch while the search index for the sketch is in use. This makes it common when uploading asynchronously, but possible even when uploading synchronously (as long as the upload rate is faster than the OpenSearch indexing rate).

If the index for a given timeline is in use, Timesketch will create a new index and timeline for the data. The new timeline is given the "original name plus 5 random characters" name. If the user uploads more data and the original search index is still in use, but the secondary index is not, Timesketch is able to find and use the secondary index, but still always creates a new timeline.

On my team, we create our own JSONL timelines and send them to Timesketch in batches, of which there can be thousands. This bug leads us to have several hundred timelines, even when there are only 2-3 indexes on the sketch.