simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.11k stars 187 forks source link

scan_email does not find email addresses that fill the entire sbuf #194

Closed simsong closed 3 years ago

simsong commented 3 years ago

This code is failing:

    sbufp = new sbuf_t("plain_text_pdf@textedit.com");
    outdir = test_scanner(scan_email, sbufp);
    email_txt = getLines( outdir / "email.txt" );
    REQUIRE( requireFeature(email_txt,"0\tplain_text_pdf@textedit.com"));

But this code passes:

    sbufp = new sbuf_t(" plain_text_pdf@textedit.com ");
    outdir = test_scanner(scan_email, sbufp);
    email_txt = getLines( outdir / "email.txt" );
    REQUIRE( requireFeature(email_txt,"1\tplain_text_pdf@textedit.com"));

It appears that the flex-based scanners are not finding email addresses that fill an entire sbuf.

I think that we can fix this with a change in sbuf_flex_scanner.h

simsong commented 3 years ago

fixed.