Update CICIDS Preprocessing and launching Experiments

This is a follow-up to PR #30 - modifies further CICIDS interface to make it easier to use in our setup. Few ideas that were implemented / bugfixed or updated:

ExperimentRunner check for dataset.path was working for cicids, but was a bad idea because of SynthethicStream raising Errors - it does not have anything like path to dataset file. It could be filtered or separately-check against SynthStream class using isinstance but this is seems like "dirty" solution, so the check was removed instead. Check is now made in __init__ of CICIDS class
Using preprocessing script for cicids adds another level of ambiguity in terms of results in experiments - selecting and obtaining a dataset should be 'easy' and autonomous, convention is one thing but using it in practice is something different
- All 3x methods in CICIDS preprocessing pipeline (merge, convert, subset) were made private and combined into generate_cicids_file - one method to obtain dataset file with specified parameters
- utils.py was added which now provides get_project_root: Path method - it allows for relative navigation inside project and is used inside CICIDS to provide default path for dataset
- all logic was moved to a separate component inside: cicids/preprocessing.py, so it is now hidden inside source files
With CICIDS we're using one class with defined 2 * 10^6 samples, but we're actually using a different subsets and special version with collapsed classes (Attempted -> BENIGN). It is a small update but It might be useful in the future:
- dataset.n_samples now shows the real number of samples that were passed to interface (e.g. 400_000 if we're using subset)
- convert_attempted: bool = False allows for setting 27 class version with extended classes set - this might come handy as we were discussing possible analysis to show whether there is a difference between attempted samples in case of model behaviour
Tests were added to make sure there are no mistakes in dataset parsing and all fits defined convention - pytest tmp_path was used to isolate test results from original dataset paths.
cicids2017_experiments.py was added to help collaborating on running cicids tests. It checks for dataset (no preprocessing needed) - if there is already dataset presents it passes creation step and starts analysis / experiments.

Examples of current usage can be found in tests and under /experiments for cicids Following is metadata from W&B for attempted dataset, as we can see it currently shows extended classes set, number of subset samples and path to subset matching convention

Classes: "27"
Features: "10"
Name: "CICIDS2017"
Path: "C:\\Users\\...\\cicids2017\\all_days_idx=400000_n=400000.csv"
Samples" "400,000"
Sparse: "False"
Task: "Multi-class classification"

rswc / ml-ids

Update CICIDS Preprocessing and launching Experiments #34