ngageoint / scale

Processing framework for containerized algorithms
http://ngageoint.github.io/scale/
Apache License 2.0
105 stars 45 forks source link

Strike regex only matches against file name for an S3 monitor #1928

Open jw-s2eas opened 3 years ago

jw-s2eas commented 3 years ago

Pain Point? Please describe. Setting up an S3 workspace only allows for it to point to the top level bucket name and not a "folder" within the bucket. Setting up an associated strike, the regular expression file ingest rule only gets applied to the base file name and not the entire file path within the bucket. This prevents the strike from matching its rules against "folders" within the bucket. For example:

bucket: my-data-bucket Two files within bucket: s3://my-data-bucket/source-alpha/2021AUG08_Image.png s3://my-data-bucket/source-beta/2021AUG08_Image.png

Creating a strike ingest rule matching: ".beta.Image.png"

This would NOT match the file in the source-beta path as the rule only matches against "2021AUG08_Image.png"

Changing the rule to match ".*Image.png" would include files from other folder that I do not want.

Desired Solution Change the rule matching to run against the entire file path instead of just the file name. This should probably be done in the ingest.models.is_there_rule_match() function. Alternatively allow the workspaces to monitor a "folder" within the S3 bucket directly.

Alternative / Workaround Make buckets for every data feed with all folders at the top level? Not feasible if I don't control the bucket or am trying to group similar feeds.

Additional Context This is probably a one line fix. Just change the self.file_name evaluation in the function to self.file_path.