simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.04k stars 184 forks source link

Update lightgrep scanner for bulk_extractor 2.0 #421

Open juliapaluch opened 1 year ago

juliapaluch commented 1 year ago

This PR has the following functionality changes:

With the deletion of other lightgrep-based scanners, we were able to delete a lot of scaffolding code.

This PR is not yet ready, but we're opening it for comment. The following remains to be done:

Please let us know if you have any questions or comments.

simsong commented 1 year ago

I'm going to close this and re-open it as a draft PR.

simsong commented 1 year ago

Apparently that's not how you did it. I found instructions here. It's a draft now.

codecov[bot] commented 1 year ago

Codecov Report

Merging #421 (16e8eeb) into main (7935c41) will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #421   +/-   ##
=======================================
  Coverage   47.94%   47.94%           
=======================================
  Files         112      112           
  Lines       13224    13224           
=======================================
  Hits         6339     6339           
  Misses       6885     6885           

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

jonstewart commented 1 year ago

I didn't know about draft PRs, TIL.

simsong commented 8 months ago

Is this PR ready to go?

jonstewart commented 8 months ago

[jeez, terrible formatting for reply-by-email]

Good question: yes, and no.

We think this PR works, but it depends on the current main branch of lightgrep. To make for a good user experience, we need to release a new version of lightgrep and then update this PR with updated build scripts that can pull that release.

The current plan is to get the new release of lightgrep out before the end of the year. It has been under continual development for the past few months, as a ~25% time project. It has several minor improvements and bug fixes (per the spirit of the ACM paper). If you’ve got a specific date in mind for a new bulk_extractor release, that would be good to know and we may be able to adjust.

We are not entirely confident in our usage of the new sbuf/scanner API. We would love a code review of this PR from you. We could also push up the requisite lightgrep code for you to test, if you’d prefer. 

simsong commented 6 months ago

Hi. What's the status on this?

jonstewart commented 6 months ago

We're getting ready to make a new lightgrep release for this to target. Can you review scan_lightgrep.cpp?