Open hacktobeer opened 2 months ago
For using GIFT PPA to install Plaso and its dependencies, I would opt to create a new docker/e2e/Dockerfile
since we only need to test pre-releases in e2e/integration tests. It would be nice to keep the other docker containers using Poetry to manage dependencies. This way, we can install releases from GIFT PPA for e2e tests, and keep using Poetry for the rest.
For using GIFT PPA to install Plaso and its dependencies, I would opt to create a new
docker/e2e/Dockerfile
since we only need to test pre-releases in e2e/integration tests. It would be nice to keep the other docker containers using Poetry to manage dependencies. This way, we can install releases from GIFT PPA for e2e tests, and keep using Poetry for the rest.
I was thinking about that option as well. The positive is that we can keep that solely for e2e/integration tests. The downside might be an extra Dockerfile to maintain and an extra poetry project file without the plaso software (as that would be installed with apt in the Dockerfile). So any change to packages and docker config will need to be maintained in different places.
I see the benefit of managing everything with Poetry, no discussion there!
Taking a step back: someone implemented code that could run jobs dependencies (like plaso) in docker images ;). How about we try to use that in the e2e tests. This would make life soooooo much easier. And if it works in e2e test we can think about using it in production as well. wdyt?
For using GIFT PPA to install Plaso and its dependencies, I would opt to create a new
docker/e2e/Dockerfile
since we only need to test pre-releases in e2e/integration tests. It would be nice to keep the other docker containers using Poetry to manage dependencies. This way, we can install releases from GIFT PPA for e2e tests, and keep using Poetry for the rest.I was thinking about that option as well. The positive is that we can keep that solely for e2e/integration tests. The downside might be an extra Dockerfile to maintain and an extra poetry project file without the plaso software (as that would be installed with apt in the Dockerfile). So any change to packages and docker config will need to be maintained in different places.
I see the benefit of managing everything with Poetry, no discussion there!
Taking a step back: someone implemented code that could run jobs dependencies (like plaso) in docker images ;). How about we try to use that in the e2e tests. This would make life soooooo much easier. And if it works in e2e test we can think about using it in production as well. wdyt?
Agreed, if we can run tasks inside Docker that would be best.
For using GIFT PPA to install Plaso and its dependencies, I would opt to create a new
docker/e2e/Dockerfile
since we only need to test pre-releases in e2e/integration tests. It would be nice to keep the other docker containers using Poetry to manage dependencies. This way, we can install releases from GIFT PPA for e2e tests, and keep using Poetry for the rest.I was thinking about that option as well. The positive is that we can keep that solely for e2e/integration tests. The downside might be an extra Dockerfile to maintain and an extra poetry project file without the plaso software (as that would be installed with apt in the Dockerfile). So any change to packages and docker config will need to be maintained in different places. I see the benefit of managing everything with Poetry, no discussion there! Taking a step back: someone implemented code that could run jobs dependencies (like plaso) in docker images ;). How about we try to use that in the e2e tests. This would make life soooooo much easier. And if it works in e2e test we can think about using it in production as well. wdyt?
Agreed, if we can run tasks inside Docker that would be best.
I do like the path of testing the Task executable dependencies in docker in e2e tests, and if that works well then we can move that to the production release images to use the same mechanisms in prod releases. That would also allow us to make the change now independently of having the entire worker based off of 24.04 right away and we can fix any remaining issues there in parallel.
Regarding GIFT vs. poetry/pypi: I do think we should aim to use the same packaging mechanism for e2e tests as we do in prod in the long term though, and for the dev/test version it looks like we may have to use GIFT for at least that. The biggest reason is that it will be difficult to test the exact versions for all of the indirect dependencies across both pypi and GIFT packages and we want to use the e2e/integration tests to catch version and other inter-dependency issues prior to pushing to production. IOW we want those tests to be pretty close to the production instance and having completely different packaging may result in other differences as well.
FYI: Unfortunately we're not going to be able to completely remove all of the Plaso related dependencies from the worker without doing some extra work to isolate the dfVFS code that does partition enumeration and possibly some other things in the pre-processors. Right now that is called as a library from the Worker code, but we'd need to write a wrapper script and call that instead for that to be isolated as a dependency.
Thanks folks.
I think we should take a step back and ask ourselves why it is so difficult to do something easy like upgrading a base OS version or testing different versions of plaso. It seems to me the Turbinia code base is overly complicated and dependency heavy where it is not needed. A good example is indeed the partition code which should not have to use a dfVFS library to just list some partitions...
Let's park this PR until there is a plan on how to untangle/cleanup Turbinia of complicated and unused code structures/functionality.
To add, good work has been done in the last years by isolating/extracting different functions from the monolitic Turbinia code base. Examples are the API server, a new client, dockerizing and running workers in k8s, a good WebUI, easy deployment scripts and initial e2e/integration tests to make life easier. The internals (server/worker) are still to complicated imo and I hope we come up with a good plan to untangle/cleanup that part.
Yes, I 100% agree that we should try to extract those dependencies from those parts of the code base! To be fair though, those dependencies are currently needed in these parts of the codebase right now :). When it was written, there was no binary that existed for that functionality (still isn't afaik) and it was a lot easier to use the dfvfs libraries to get programmatic access to the raw disk info. You're right, it definitely would make things easier to isolate the dependency though, so I'll write up some options for that out of band.
Description of the change
Moving to Ubuntu 22.04 and using GIFT-PPA for forensic tooling.
Applicable issues
Additional information
Checklist