muchdogesec / txt2stix

txt2stix is a Python script that is designed to identify and extract IoCs and TTPs from text files, identify the relationships between them, convert them to STIX 2.1 objects, and output as a STIX 2.1 bundle.
https://www.dogesec.com/
Apache License 2.0
21 stars 3 forks source link

Issue with credit card normalisation #13

Closed himynamesdave closed 1 month ago

himynamesdave commented 2 months ago
 python3 txt2stix.py \
        --relationship_mode standard \
        --input_file tests/inputs/extraction_types/generic_bank_card_mastercard.txt \
        --name 'Test 3.1.34 pattern_bank_card_mastercard' \
        --tlp_level clear \
        --confidence 100 \
        --use_extractions pattern_bank_card_mastercard

Produces

bundle--ce871ba3-3850-4840-b24d-4abbdd03504a.json

You can see 2 dupe indicator are created for bank-card (5588601012060076)

The test file

generic_bank_card_mastercard:
  test_positive_examples:
    - '5588601012060076'
    - '5588 6010 1206 0076'

Has the same card number (one with spaces, one without).

I suspect the 2 indicators are being created because the creation is done before normalisation (remove spaces) of these two cards into one.

Expected result is 1 indicator, 1 bank card for each unique card number (once spaces removed).

himynamesdave commented 1 month ago

@fqrious it the same test now extracts 0 results

bundle--3539e8c5-e7fa-5a72-a0b4-c7a9717bb15c.json

fqrious commented 1 month ago

it's failing luhn test, I added luhn validation

himynamesdave commented 1 month ago

@fqrious the test

 python3 txt2stix.py \
        --relationship_mode standard \
        --input_file tests/inputs/extraction_types/generic_bank_card_mastercard.txt \
        --name 'Test 3.1.34 pattern_bank_card_mastercard' \
        --tlp_level clear \
        --confidence 100 \
        --use_extractions pattern_bank_card_mastercard

still fails with no extractions, expecting 1 extraction

fqrious commented 1 month ago

it's invalid

Image