whylabs / whylogs

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
https://whylogs.readthedocs.io/
Apache License 2.0
2.64k stars 119 forks source link

Some Example Notebooks are out date or have unclear UX #515

Closed TheMellyBee closed 2 years ago

TheMellyBee commented 2 years ago

Problem

Some of the sample note books have gotten outdated and need modification. Some may need to be updated, others are duplicated, and some may not serve a clear purpose.

This issue is to track through the comments those that should be looked into so the knowledge is not lost or siloed.

Fixes

Update through comments with PR or decisions made.

Related to #508

TheMellyBee commented 2 years ago

Analysis.ipynb

https://github.com/whylabs/whylogs/blob/mainline/examples/Analysis.ipynb

Ask List

  1. It's a sipped test This is part of the skipped notebook tests https://github.com/whylabs/whylogs/blob/e0c87bee9943b50fd70a141d533c6c3f59b94a14/test_notebooks/notebook_tests.py#L13

  2. whylogs init The second cell references whylogs init #508 talks about how this is a CLI that may be being changed.

  3. TODOs

  4. Data file not available flat_summary = pd.read_csv(os.path.join(profile_dir, "summary_summary.csv")) This data file is not available nor does it get created with the above steps.

  5. Uses matplotlib Do we still want to have this example with matplotlib?

  6. No Conclusion Could a conclusion get added. This gave me the feeling of being an unfinished through finishing with a code segment. 😄

TheMellyBee commented 2 years ago

Auto Segmentation Example

https://github.com/whylabs/whylogs/blob/mainline/examples/Auto%20Segmentation.ipynb

Everything seemed to work with just 1 warn at the beginning, but I'm not sure what this is showing me or why it would be helpful.

Request to flush this out a bit more explanation and maybe an example on why this is helpful.

TheMellyBee commented 2 years ago

Constraints Suite

https://github.com/whylabs/whylogs/blob/mainline/examples/Constraints_Suite.ipynb

I found this one very helpful.

Ask List

  1. This is a long one, add a table of contents
  2. Intro and conclusion would be nice
  3. Cleaning it up, potentially adding white space
  4. This could be a good one to highlight it gave me a lot of ideas.
TheMellyBee commented 2 years ago

Constraints

https://github.com/whylabs/whylogs/blob/mainline/examples/Constraints.ipynb

The Constraint Suite above was much more useful, add some of this in an intro as requested above and maybe just remove this.

TheMellyBee commented 2 years ago

DatasetDrift

https://github.com/whylabs/whylogs/blob/mainline/examples/DatasetDrift.ipynb

It wasn't apparent right off the bat what this notebook would show me. Could we add an introduction and conclusion?

TheMellyBee commented 2 years ago

Guest Session Demo

https://github.com/whylabs/whylogs/blob/mainline/examples/Guest%20Session%20Demo.ipynb

This is on the notebook skip list: https://github.com/whylabs/whylogs/blob/e0c87bee9943b50fd70a141d533c6c3f59b94a14/test_notebooks/notebook_tests.py#L11

Ask List

  1. Could we just remove this one? It doesn't seem to give more function than what is in others.
  2. If not it currently has a type error TypeError: start_whylabs_session() got an unexpected keyword argument 'data_collection_consent'
  3. Also this needs to be fleshed out if it is kept.
TheMellyBee commented 2 years ago

Inspect saved log formats

https://github.com/whylabs/whylogs/blob/mainline/examples/Inspect_saved_log_formats.ipynb

This does work but I'm not sure why it's here. If it's for testing could we just convert it into a test? If it's a good example it would be worth a redo to make it clear what's for why it's useful and easier to navigate.

TheMellyBee commented 2 years ago

Logging Images

https://github.com/whylabs/whylogs/blob/mainline/examples/logging_images.ipynb

Ask

  1. Block three has some errors. Causes different responses than those shown from the saved notebook Screen Shot 2022-04-06 at 5 53 52 PM
  2. Same with block 10 Screen Shot 2022-04-06 at 5 56 40 PM

This is neat @jamie256 ! I like it 👍

TheMellyBee commented 2 years ago

Logging

https://github.com/whylabs/whylogs/blob/mainline/examples/Logging.ipynb

Ask List

  1. References whylogs init' and 'Analysis.pynb which we have comment on already #508 and https://github.com/whylabs/whylogs/issues/515#issuecomment-1089512149

First paragraph - "The resulting profile can also be produced from the command line interface. The workflow to work with these files, along with deeper analysis and visualization examples, can be found in the Analysis.ipynb that is generated with whylabs init." Later - "For more information about the contents of these objects, consult the Analysis.ipynb notebook"

  1. There are histograms could we display a visual instead of listing bins? histograms = summary['hist'] histograms["delinq_amnt"]

  2. Seems this would be good to be a part of a getting started notebook instead of on it's own.

  3. Cleaning up the empty code lines and just polishing it a bit.

TheMellyBee commented 2 years ago

MLflow + whylogs Integration

https://github.com/whylabs/whylogs/blob/mainline/examples/MLFlow%20Integration%20Example.ipynb

Note: This is an amazing notebook. It was so nice to have a walk through on set up and it was easy to read.

Ask List

  1. References a "Getting Started" Notebook that is in a different repo than this. Could we make the referenced notebook more obvious that it's a good start point in the why logs repo? "Make sure to update this reference If you'd like to learn more about whylogs, check out our [introductory notebook]"(https://github.com/whylabs/whylogs-examples/blob/mainline/python/GettingStarted.ipynb).

This has to do with #509

  1. This is not on the automated testing list. Could we integrate testing for it?

  2. KEY_ERROR on code block 12. Make sure to add api key instructions to the top where it works through environment set up. Interestingly this still seemed to work throughout the rest of the notebook even without fixing this. Could this cause a hidden error in graphs?

    Exception in thread Thread-14:
    Traceback (most recent call last):
    File "/Users/melanie/opt/anaconda3/envs/whylogs-mlflow/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
    File "/Users/melanie/opt/anaconda3/envs/whylogs-mlflow/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
    File "/Users/melanie/opt/anaconda3/envs/whylogs-mlflow/lib/python3.8/site-packages/whylogs/app/writers.py", line 434, in _write_protobuf
    upload_profile(profile)
    File "/Users/melanie/opt/anaconda3/envs/whylogs-mlflow/lib/python3.8/site-packages/whylogs/whylabs_client/wrapper.py", line 79, in upload_profile
    _upload_whylabs(dataset_profile, dataset_timestamp, profile_path)
    File "/Users/melanie/opt/anaconda3/envs/whylogs-mlflow/lib/python3.8/site-packages/whylogs/whylabs_client/wrapper.py", line 84, in _upload_whylabs
    log_api = _get_or_create_log_client()
    File "/Users/melanie/opt/anaconda3/envs/whylogs-mlflow/lib/python3.8/site-packages/whylogs/whylabs_client/wrapper.py", line 34, in _get_or_create_log_client
    _api_key = os.environ["WHYLABS_API_KEY"]
    File "/Users/melanie/opt/anaconda3/envs/whylogs-mlflow/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
    KeyError: 'WHYLABS_API_KEY' 
TheMellyBee commented 2 years ago

Overlap in Profiling

https://github.com/whylabs/whylogs/blob/mainline/examples/Profile_Viewer_In_Notebook.ipynb https://github.com/whylabs/whylogs/blob/mainline/examples/Dataset_Profiler_Viewer.ipynb https://github.com/whylabs/whylogs/blob/mainline/examples/Notebook_Profile_Viewer.ipynb

TheMellyBee commented 2 years ago

Rapids GPU

https://github.com/whylabs/whylogs/blob/mainline/examples/RAPIDS%20GPU%20Integration%20Example.ipynb

Ask List

  1. Add title
  2. Add introduction
  3. Create a note that this shouldn't be used on M1 or doing an os check. Seems obvious, but I think it would up the usability.
  4. Maybe give an alternative such as a colab instructions to allow anyone to test this out.

Note: I'm on M1, test this one out for errors later

TheMellyBee commented 2 years ago

Saving S3 Example

https://github.com/whylabs/whylogs/blob/mainline/examples/S3%20example.ipynb

Ask List

  1. Give some more information
  2. Adding in a nice install such as that seen in https://github.com/whylabs/whylogs/issues/515#issuecomment-1090985540
  3. Clean up and polish the notebook
  4. Links to S3
TheMellyBee commented 2 years ago

Streaming Mode

https://github.com/whylabs/whylogs/blob/mainline/examples/Streaming%20Mode%20-%20whylogs.ipynb

Ask List

  1. Add a title

  2. Linking appropriate notebooks. "This session can be connected with multiple writers that output the results of our profiling locally in JSON, a flat CSV, or binary protobuf format as well as writers to an AWS S3 bucket in the cloud. Further writing functionality will be added as well." There are several notebooks going into this explicitly let's link to make this nicely polished.

  3. This section talks about discussing how logging works, but doesn't explain where this file comes from or what it does. Either give a short discussion or link to the notebook that does. config = load_config(".whylogs_local.yaml")

  4. Simple visuals would make this more compelling and neater, but overall good! :)

TheMellyBee commented 2 years ago

Logging String Data#514

https://github.com/whylabs/whylogs/blob/mainline/examples/String_Features.ipynb

Ask List

  1. The cell with the following gives a token size of 1 throughout which is different from the saved example may be an error going on. viz.plot_token_length("zipcode",character_list="-0123456789")

  2. Type error, opened issue #520 to address

    Screen Shot 2022-04-06 at 7 10 20 PM

Same error further down

Screen Shot 2022-04-06 at 7 11 17 PM
TheMellyBee commented 2 years ago

Whylogs Feast Integration

https://github.com/whylabs/whylogs/blob/mainline/examples/feast_whylogs_example/feast_whylogs.ipynb

Ask List

  1. Some of the files are from the why logs-examples repos. #509
  2. Not in the notebook testing, is there a way to add some testing on this to make sure we catch errors
  3. FileNotFoundError: [Errno 2] No such file or directory: 'data/driver_stats.parquet'

Look for more errors once data ingestion is corrected

TheMellyBee commented 2 years ago

ML Flow & WhyLogs Integration

https://github.com/whylabs/whylogs/blob/mainline/examples/mlflow_whylabs_example/mlflow_whylabs.ipynb

Ask List

  1. This is not tested in the notebook testing, could it be added or refactored to ensure no errors?
  2. Hits ValidationError right away Screen Shot 2022-04-06 at 8 23 13 PM
  3. Is this duplicated with https://github.com/whylabs/whylogs/issues/515#issuecomment-1090985540?
TheMellyBee commented 2 years ago

Monitoring Classification Model Performance Metrics

Interestingly this all seemed to work on the notebook side, but doesn't seem to have the

Screen Shot 2022-04-06 at 8 34 41 PM Screen Shot 2022-04-06 at 8 35 23 PM

Is this just because it's just for one day and it needs to be run for several weeks? Could we clarify this either way?

TheMellyBee commented 2 years ago

I'm going to try the above as I think I had the setting to unknown not classification

TheMellyBee commented 2 years ago

ROV Logs

https://github.com/whylabs/whylogs/blob/mainline/examples/rov_whylogs/ROV-whylogs.ipynb

Ask List

  1. Not on the notebook test list. Could we change this to be able to add it on to that?
  2. Errors on first cell block
    !pip install -r requirements.txt
    ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
TheMellyBee commented 2 years ago

I have not run any from the whylogs-example repo. And I did not run the sagemaker or flask notebooks

TheMellyBee commented 2 years ago

Through out these I ran into a warning for no config file for the whylogs session logger. Is this something that needs to be updated?

FelipeAdachi commented 2 years ago

Auto Segmentation Example

https://github.com/whylabs/whylogs/blob/mainline/examples/Auto%20Segmentation.ipynb

Everything seemed to work with just 1 warn at the beginning, but I'm not sure what this is showing me or why it would be helpful.

Request to flush this out a bit more explanation and maybe an example on why this is helpful.

Maybe this one is now redundant with the recent segmentation notebook? currently on: https://github.com/FelipeAdachi/whylogs/blob/segbug/examples/segmentation/segments.ipynb

DatasetDrift

https://github.com/whylabs/whylogs/blob/mainline/examples/DatasetDrift.ipynb

It wasn't apparent right off the bat what this notebook would show me. Could we add an introduction and conclusion?

I think we should remove this. It's not a feature per se, only shows how to use profiles with outside matplotlib functionalities. Besides, this is currently done in NotebookProfileViewer, so I think it's redundant.

Inspect saved log formats

https://github.com/whylabs/whylogs/blob/mainline/examples/Inspect_saved_log_formats.ipynb

This does work but I'm not sure why it's here. If it's for testing could we just convert it into a test? If it's a good example it would be worth a redo to make it clear what's for why it's useful and easier to navigate.

I agree, this earns a redo. I actually think it's good to have an example to show how to inspect the logs, but it's hard to understand what it's trying to say in the current version.

https://github.com/whylabs/whylogs/blob/mainline/examples/String_Features.ipynb

If I'm not mistaken, this is covered by tests. I'll check the other issues you mentioned

ROV Logs

https://github.com/whylabs/whylogs/blob/mainline/examples/rov_whylogs/ROV-whylogs.ipynb

Ask List

  1. Not on the notebook test list. Could we change this to be able to add it on to that?
  2. Errors on first cell block
!pip install -r requirements.txt
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'

This is tough to add on the notebook test because it need external infra - namely Kafka cluster

github-actions[bot] commented 2 years ago

This issue is stale. Remove stale label or it will be closed tomorrow.