muchdogesec / txt2stix

txt2stix is a Python script that is designed to identify and extract IoCs and TTPs from text files, identify the relationships between them, convert them to STIX 2.1 objects, and output as a STIX 2.1 bundle.
https://www.dogesec.com/
Apache License 2.0
22 stars 3 forks source link

Add token check, remove input character limit #19

Closed himynamesdave closed 2 months ago

himynamesdave commented 2 months ago

We currently have a INPUT_CHARACTER_LIMIT to limit the size of content to ensure it does not break AI.

This is too simplistic and should be removed.

We should instead use a INPUT_TOKEN_LIMIT using a library like: https://github.com/openai/tiktoken/blob/main/README.md

himynamesdave commented 2 months ago

still need to remove all refs INPUT_CHARACTER_LIMIT

========
06-Aug-24 19:32:54 [txt2stix] [INFO] Arguments: ["--relationship_mode", "ai", "--input_file", "tests/inputs/extraction_types/generic_cryptocurrency_btc_transaction.txt", "--name", "Test 3.1.31.3 pattern_cryptocurrency_btc_transaction", "--tlp_level", "clear", "--confidence", "100", "--use_extractions", "pattern_cryptocurrency_btc_transaction"]
Traceback (most recent call last):
  File "/Users/dgreenwood/Documents/repos/dogesec/txt2stix/txt2stix.py", line 3, in <module>
    main()
  File "/Users/dgreenwood/Documents/repos/dogesec/txt2stix/txt2stix/txt2stix.py", line 209, in main
    load_env(len(aliased_input))
  File "/Users/dgreenwood/Documents/repos/dogesec/txt2stix/txt2stix/txt2stix.py", line 146, in load_env
    raise FatalException(f"env variable `{env}` required")
txt2stix.common.FatalException: env variable `INPUT_CHARACTER_LIMIT` required